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I. INTRODUCTION 

In file management, one may use data compression and 
archiving for cost reduction in data storage and transmission. 
In other words, the collection and analysis of data can reap 
benefits from compression. There are numerous kinds of data 
compression and archiving schemes. Popular software for data 
compression are StacPack, ARC, BTLZ, PKZIP, Splay, SHRINK, 
DIET, PKLTE, ARJ, LHA, PAK, ZOO, PKPAK, and LZEXE 
Met 3 6,7,6)12,13,14,15,16,17,22,23]. Some of these are 
solely for executable files while others are good for binary 
graphic files. Additionally, each software may have its own 
set of operating environment and performance edge. No two are 
identical. The Naval Security Group Detachment in Pensacola, 
Florida expressed its interest in evaluating public available 
compression software [Ref.24]. It is therefore interesting and 
desirable to compare the performance of each software in the 
Naval operating environment. 

In this thesis, 3 methods for compression, and 4 methods 
for compression with archiving are chosen for comparing. The 
PKZIP package 1S examined for both compression and compression 
with archiving. This thesis focuses on reversible data 
compression: the original file can be completely recovered 


from the compressed file. 


The benefits of data compression are many. First, hardware 
costs can be cut back because of the reduced capacity 
requirement for disk drive units. Second, given a fixed 
amount of disk space, more data can be kept online. Third, 
the speed of effective data transfer can be increased while 
reducing costs when copying files to disks or tapes, sending 
data over communications equipment, and shipping data recorded 
on disks or tapes. Fourth, the amount of media (e.g. tapes) to 
archive the data offline can be reduced. Last, as a result of 
the compression process, compressed files are encrypted; 
therefore, they automatically acquire greater protection from 
unauthorized access [Ref.13]. The trade-off for the benefits 
is mainly in execution time. The more effective compression 
algorithms generally need more CPU overhead than the less 
effective ones [Ref.13]. The result of experiments conducted 
in this thesis shows that a good archiving program generally 
results in good performance in data compression. 

This thesis is organized as follows. Chapter II discusses 
the generic compression algorithms while Chapter III examines 
the algorithms used in each software package. The main effort 
of data compilation and analysis are presented in Chapter IV. 


Concluding remarks can be found in Chapter V. 


Dd Da GENERIC COMPRESSION ALGORITHMS 
In this chapter, several algorithms for data compression 
are introduced. These algorithms are already employed in 
commercial software. The compression ratios and archiving 
effectiveness of these commercial software packages will be 


compared and analyzed in Chapter IV. 


A. INTRODUCTION TO DATA COMPRESSION 

Data compression is often referred to as source coding. 
Information theory is defined as the study of efficient coding 
and its consequences in the form of speed of transmission and 
probability of error. Data compression may be viewed as a 
branch of information theory in which the primary objective is 
to minimize the amount of data to be transmitted [Ref. 10}. 

With most file types, some recurring patterns of bytes or 
words (redundancy) can be found. This effect can be optimized 
in a compressed file with symbols which indicate to the 
decompression program the particular pattern to restore at 
that location. The simplest and most common. pattern, 
regardless of file type, is a string of repeating single 
characters or binary words. Most often these are strings of 
blanks which occur between words, statements, and paragraphs 
in text files. Other forms of redundancy tend to be more 


file-type specific. COBOL source code, for example, is 


partially composed with a known set of reserved words which 
occur with great frequency within each program. 

Once all the redundancies have been detected, the encoding 
algorithms, static or dynamic, can be used to code these 
redundancies. There always remains a core of information which 
cannot be compressed further. A compressed file contains the 
information which distinguishes it from any other file. At 
this point, the file can not be further reduced without some 
loss of iInsermarren. 

Most compression algorithms use ae—- start-to-finish 
operation, that is, the entire file must be processed as a 
Single unit. The entire file must be decompressed in order to 
access it. This scheme renders the use of data compression 
with production files inconvenient. An additional drawback to 
compressing information might be that compressed files are 
more susceptible 1e0) COrrupmeren. Particularly “wile 
start-to-finish algorithms, decompression requires a precise 
sequence of operations, which is exactly the reverse of the 
compression sequence. If this sequence is disrupted by a few 
corrupted bits on the storage media, it is possible to lose 
the remainder of the file. However, the reliability of 
current storage hardware makes this risk rather small 
[Ret wks. 

No single technique described in the following section is 
the best in all situations. Typically, a sophisticated 


compression product will combine several of the following 


methods as well as other technigues in the effort to extract 


every last unnecessary bit out of a compressed file. 


B. STATIC HUFFMAN CODING 

The main idea behind Huffman coding is based on the 
frequency of occurrence of a symbol in the text. Symbol is 
Gefined as a particular sequence of bits. The most frequently 
used symbols are assigned a shorter binary pattern and less 
frequently symbols are assigned a longer pattern. 

A static method is one in which the mapping from the set 
of codewords is fixed before transmission begins so that a 
given message is represented by the same codeword every time 
it appears in the message ensemble [Ref.10]. 


Huffman's algorithm, expressed graphically, takes as input 


a list of nonnegative weights {W,, +--+, Wi} and constructs a 
full binary tree - a binary tree is full if every node has 
either zero or two branches - whose leaves are labeled with 


the weights. When the Huffman algorithm is used to construct 
a code, the weights represent the probabilities associated 
with the source letters. Initially, there is a set of 
Singleton trees, one for each weight in the list. At each step 
in the algorithm the trees corresponding to the two smallest 
weights, Ww, and w;, are merged into a new tree whose weight is 
W, + Ww; and whose root has two branches that are the subtrees 
represented by w; and w,. The weights w, and wW, are removed 


meome the list, and w. + W, 1s inserted into the list. This 


process continues until the weight list contains a single 
value. If, at any time, there is more than one way to choose 
a smallest pair of weights, any such pair may be chosen. In 
Huffman's paper the process begins with a nonincreasing list 
of weights. This detail is not important to the correctness of 
the algorithm, but it does provide a more efficient 
implementation. The Huffman algorithm is demonstrated in 


Figure 1 and Figure 2 [Ref.10]. 





Fig. 1. The List of Huffman Process. 


The Huffman algorithm determines the lengths of the 
codeword to be mapped to each of the source letters ai. Tae 
are many ways for specifying the actual bits; it is necessary 
only that the code have the prefix property. The usual 
asSignment entails labeling the edge from each tree to its 
left branch with the bit O and the edge to the right branch 
with 1. The codewords for each source letter are the sequence 
of labels along the path from the root to the leaf node 
representing that letter. The codewords that can be generated 
from Figure 2, in order of decreasing probability, are {07, 


11, 001, 100, 101, 0000, 0001}. Clearly, this process yields 





Fig. 2. The Tree of The Huffman Process. 


a minimal prefix code. Furthermore, the algorithm is 
guaranteed to produce an optimal (minimum redundancy) code. 
Gallager has proved an upper bound on the redundancy of a 
mueetan code equal Pr log([(2 fog e)/e) ~ P+ 0.086, where 


P, 1s the probability of the least likely source message 


fRef.10}). Figure 3 shows the distribution for which the 
Huffman cede is optimal . 

In addition to the fact that there are many ways of 
forming codewords of appropriate lengths, there are cases in 
which the Huffman algorithm does not uniquely determine these 
lengths owing to the arbitrary choice among egual minimum 
weights. For example, codes with codeword lengths of 
(1,2,3,4,4} and {2,2,2,3,3} both yield the same average 
codeword length for a source with probabilities /{.4, 7.27). 
.1, .1}. Schwartz defines a variation of the Huffman 
algorithm that performs "bottom merging", that is, that orders 
a new parent node above existing nodes of the same weight and 
always merges the last two weights in the list. The code 
constructed is the Huffman code with minimum values of maximum 
codeword length (max({l,;}) and total codeword length (2i® 


Schwartz and Kallick describe an implementation of Huffman's 


a, es. 
A> wiley. 
az ales 


ay kG 
As ik 





Average codeword length 
Fig. 3. Distribution of Huffman Code. 
algorithm with bottom merging. The Schwartz-Kallick algorithm 
and a later algorithm by Connell use Huffman's procedure to 
determine the lengths of the codewords, and actual digits are 


assigned so that the code has the numerical sequence property; 


that is , codewords of equal length form a consecutive 
sequence of binary numbers. Shannon-Fano codes also have the 
numerical sequence property. This property can be exploited to 
achieve a compact representation of the code and rapid 


encoding and decoding [{Ref.10]. 


Cc. LZ77 OPM/L TEXT COMPRESSION TECHNIQUE 

Lempel-Ziv coding represents a departure from the classic 
view of a code as a mapping from a fixed set of source 
messages(letters, symbols, or words) to a fixed set of code- 
words. 

One of the popular data-compression algorithms, suggested 
by Ziv and Lempel is the OPM/L (Original Pointer Macro 
restricted to Left Pointers), LZ77 [Ref.2]. OPM/L uses 
sliding-window dictionary (SWD), a variation of the Lempel- 
Ziv-Welch (LZW) algorithm. The basic idea behind SWD is 
Simple: substrings of the input stream are stored in a 
dictionary. Each dictionary entry is assigned a value. Then, 
if a later section of the input stream is found within the 
dictionary, the value of this dictionary entry is substituted 
in place of the longer original data. 

The OPM/L scheme replaces a substring in a text with a 
pointer to a previous (left) occurrence of the substring in 
the text. The pointer represents the position and size of the 
substring in the original text. These restrictions make fast 


Single-pass decoding straightforward [Ref.2]. 


The LZ77 scheme restricts the reach of the pointer to 
approximately the previous N characters, effectively creating 
a "window" of N characters which is used as a sliding 
dictionary. Pointers are chosen using a "greedy" algorithm 
which permits single-pass encoding [Ref.2]. Following are 


advantages of using window: 


1) The amount of memory required for encoding and decoding is 
bounded by the size of the window, and is typically no more 
than 8 kbytes; 

2) For many types of text, and for sufficiently large N, the 
window is a good dictionary for the substring which followeas 
because it will usually contain the same language, style, and 
COple. ane 


3) All pointers can have fixed size fields. 


An LZ77 encoder is parameterized by N, the size of the 
"window", and F, the maximum length of a substring that may be 
replaced by a pointer. Encoding of the input string proceeds 
from left to right. At each step of the encoding, a section of 
the input text is available in a window of N characters. Of 
these, the first N-F characters have already been encoded and 
the last F characters are the "lookahead burice” | teri 

stone example (Ref.2], if the string Ss = 


abcabcabcabcabcabc... 
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is being encoded with the parameters N = 11 and F = 4 and 
character 12 is to be encoded next, the window is shown as 


Figure 4. 
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Initially the first N - F characters of the window are 
(arbitrary) blanks, and the first F characters of the text are 
loaded into the lookahead buffer. 

The already encoded part of the window is searched to find 
the longest match for the lookahead buffer. The match may 
overlap with the lookahead buffer, but obviously cannot be the 
lookahead buffer itself. In the example, the longest match for 
the "babc" is "bab", which starts at character 10. 

The longest match is then coded into a triple <i,j,a>, 
where i is the offset of the longest match from the lookahead 
buffer, j is the length of the match, and a is the first 
character which did not match the substring in the window. In 
the example, the output triple would be <2,3,'c'>. The window 
is then shifted right j + 1 characters, ready for another 


coding step. 


ae 


A window of moderate size, typically N < 8192, can work 


well for a variety of texts for the following reasons: 


1) Common words and fragments of words occur regularly enough 
in a text to appear more than once in a window. For example, 
in English "the," "of," “pre=," “-ing,"; source =p meneame 
keywordss"whi lemiee iene “thenes 

2) Specialist words tend to occur in clusters. For example, a 
paragraph on a technical topic, or local identifiers in a 
PrOoceaqure Of ay Souvee  orognronur 

3) Less common words may be made up of fragments of common 
words. 

4) Runs of characters are coded compactly. For example, kK 
blanks may be coded recursively as <?, ?, ' '> <1, K-11, 725 
The amount of memory required for encoding and decodingiems 
limited to the size of the window. The offset (1) ina triple 
can be represented in [log, (N=F)} bits, and the munbcmia 
characters (j) covered by the triple in, {1095 7) )5)> uae 
time taken at each step is bounded to N - F substring 
comparisons, which is constant, so the time used for encoding 


1s O(n) for a text of Sive ni Renazse 


Decoding is very simple and fast. The decoder maintains a 
window in the same way as the encoder but, instead of 
searching for a match in the window, it copies the match from 


the window using the triple given by the encoder [Ref 2]. 
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Mie main GQisad@vantage of LZ77 is that, although the 
encoding step requires O/(1) anes a straightforward 
implementation can require up to (N - F)*F character 
comparisons, typically on the order of several thousands. L277 
is therefore best for the situation where a file is to be 
encoded once (preferably on a fast computer) and decoded many 
times, possibly on a small machine [Ref.2]. 

LZSS, a slightly modified version of LZ77 which improves 
the compression ratios for a wide range of text was developed 
by Storer and Szymanski. It offers very fast decoding but 
requires comparatively little memory for coding and decoding 
Beet. 18] . 

Storer and Szymanski presented a general mode for data 
compression that encompasses Lempel-Ziv coding. Their broad 
theoretical work compares classes of 'macro schemes', where 
macro schemes include all methods that factor out duplicate 
occurrences of data and replace them by references either to 
the source ensemble or to a code table. They also contribute 
a linear-time Lempel-Ziv-like algorithm with better 


performance than the standard Lempel-Ziv method [Ref.10]. 


D. ARITHMETIC CODING 

At present, most of the commonly used data compression 
methods fall into one of two categories: dictionary-based 
schemes or statistical methods. In the world of small 


systems, dictionary-based data compression techniques seem to 


Bs 


be more popular. However, by combining arithmetic coding with 
powerful modeling techniques, statistical methods for data 
compression are actually able to achieve better performance 
[ Ret sseo 

The method of arithmetic coding was suggested by Elias and 
presented by Abramson [Ref.10] in his text on information 
theory. Implementations of Elias' technique were developed by 
Risssanen, Pasco, Rubin, and, most recently, Written et al. 

Arithmetic coding is based on the idea that each symbol is 
not coded independently one after another as in a Huffman 
code, but coded as a portion of the real interval between 0 
and 1. Each symbol of the ensemble narrows this interval. As 
the interval becomes smaller, the number of bits needed to 
specify it grows. Arithmetic coding assumes an explicit 
probabilistic model of the source. It is a defined-word scheme 
that uses the probabilities of the source messages to 
Successively narrow the interval used to represent the 
ensemble. A high-probability message narrows the interval 
less(faster) than a low-probability messages, and contributes 
fewer bits to the coded message. The method begins with an 
unordered list of source messages and their probabilities. The 
number line is partitioned into subintervals on the basis of 
cumulative probabilities. 

It is instructive to see an example [Ref.10]. Given source 
messages {A,B,C,D,#} with probabilities /{.2,.4, -<1, 228 


Table I shows the initial partitioning of the number line [0, 
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1}. The symbol A corresponds to the first 1/5 of the interval 
{[0O,1), Bis the next 2/5, and D is the subinterval of size 1/5 
which begins at 70% of the interval from the left endpoint. 


Table I The arithmetic coding model 


Cum BrOD: Range 


Ome 2) 
[ie eG ) 
ae) 





When encoding begins, the source ensemble is represented 
by the entire interval [0,1). For the ensemble AADB#, the 
first A reduces the interval to [0,.2) and the second A to 
meee O4) (the first 1/5 of the previous interval or 0.2x0.2[0, 
mee. oO Lurceher narrows the interval to /{.028, .036)(1/5 of 
the previous size, beginning 70% of the distance from left to 
mgmt Or O.2xX0.2x/0.7, 0.9]). B narrows the interval to 
mage 6, .0328)(2/5 of the previous size, [.028, .036], 
beginning 20% and ending 60% of the distance from left to 
Mme, {.028+.0016, .028+.0048]) and the # yields a final 
interval of [.03248, .0328). The interval, or alternatively 
any number i within the interval, may now be used to represent 


the source ensemble. 


ie> 


Two equations may be used to define the narrowing process 


Gescribed above: 


newleft prevleft + msgleft x prevsize (1) 


newsize = prevsize x msgsize 2} 


Equation (1) states that the left endpoint of the new 
interval is calculated from the previous interval and the 
current source message. The left endpoint of the range 
associated with the current message specifies what percent of 
the previous interval to remove from the left in order to form 
the new interval. For character D in the above example 
(AADB#), the new left endpoint is moved by .7 x .04 (70% of 
the size of the previous interval). Equation (2) computes the 
Size of the new interval from the previous interval size and 
the probability of the current message (which is equivalent to 
the size of its associated range). Thus, the size of the 
interval determined by Dis .04 x.2, and the right endpoint is 
.028 + 008 = .036 (left endpeimme +aeize)— 

The size of the final subinterval determines the number of 
bits needed to specify a number in that range. The number of 


bits needed to specify a subinterval of [0, 1) of size s is: 
KY = slog pes 


Since the size of the final subinterval is the product of the 


probabilities of the source messages in the ensemble: 
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+ 


N 
s = |] P(source message i) 
j=l 


N : length of the ensemble 


we have: 

Ie 
- log, s = - } log, P(source message i) 
tex 
N 
ee es ce lOO, PCa.) 

i=l 

Peeler GF Unique Source messages a,, 2g eo os ele 


Thus, the number of bits generated by the arithmetic coding 
technique is exactly equal to the entropy. This demonstrates 
the fact that arithmetic coding achieves compression which is 
almost exactly that predicted by the entropy of the source. 
In order to recover the original ensemble, the decoder 
must know the mode of the source used by the encoder (e.g., 
the source messages and associated ranges) and a single number 
Within the interval determined by the encoder. Decoding 
consists of a series of comparisons of the number i to the 
ranges representing the source messages. For the example of 
AADB#, 1 might be .0325 or a number in [.03248, .0328]. The 
decoder uses i to simulate the actions of the encoder. Since 
i lies between 0 and .2, the decoder deduces that the first 
letter was A (Since the range is [0,.2]). The decoder can now 
deduce that the next message will further narrow the interval 
m@mome Of the following ways: to ([0,.04) for C, to [.14,.18) 


Dole Dyeeor to /0,.04)> the decoder knows that the second 
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message is again A. This process continues until the entire 


ensemble has been recovered [Ref.10]. 


Ide SHANNON-FANO CODING 

As one of the optimum source coding scheme with Huffman 
code, Shannon-Fano code is known for its reasonable efficiency 
with instantaneous decodability. Shannon-Fano coding is a 
variable length coding process. Before one decides the code 
for each character, one has to determine the probability of 
the occurrence of each character and then arrange the source 
message in descending order, which is based on the 
probability of occurrence of each character. Once it is done, 
the character set(source message) must be divided into two 
subsets of equal, or almost equal, probability. The first 


Table II Shannon-Fano Coding 


ee Se = 


Charac. Prob. 
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digit in one subset is assigned a binary O value while a 
binary 1 is assigned as the first digit in the second subset. 
This process of forming subsets is continued until the 
character set is completely subdivided. Finally, a suffix bit 
is added to each character in a two-character subset as 
required to distinguish one character's binary composition 
from the other character in the subset [Ref.10]}. 

To help understand Shannon-Fano coding, consider the 
following example [Ref.9: p.107-109]. It is assumed the 
character set contains 8 characters with the probabilities 
given in Table II. 

The third column of Table II is the character set arranged 
in descending order based upon the probabilities. To form the 


Table III An Example of a Completed Shannon-Fano Code 


Character Probability 


Ome 3 
0220 





subsets, we have to group the characters in them so that they 


are equal or as nearly equal as possible. We next assign 


ie, 


binary 1's to one subset and binary O's to the other subset 
and continue the process until all possible subsets are 
constructed. The fifth column of Table II shows the process 


[Ref.9]. 


F. LZW CODING 

This is one of the modified version of Lempel-Ziv, which 
involves the way in which the string table is stored and 
accessed [(Ref.10]}. 

Welch described the implementation of this algorithm known 
as the LZW algorithm. It has the advantage of being adaptive. 
That is, the algorithm does not assume any advance knowledge 
of the properties of the input and builds the dictionary used 
for compression only on the basis of the input as it is read. 
This property iS especially important in compression for 
communication. This method contrasts compression algorithms 
which are based on advance knowledge of the properties of the 
input, e.g. Huffman algorithm (Ret. )o)- 

The LZW algorithm starts with a dictionary containing 
entries for each character in the alphabet. The algorithm 
scans the input matching it with entries in the dictionary. 
The matching is finished, such that Y = X.a, where X 1s a 
string already in the dictionary, "a" is a character and "." 
denotes the concatenation operation. The compression algorithm 
then sends the code for X (an index into the dictionary table) 


and inserts Y into the dictionary. The string Y is called a 
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character extension of X. The encoding of the input continues 
from the character "a" that follows X. Meanwhile, the decoder 
builds an identical dictionary to the one built by the encoder 
Beet .19). 

The entries for the LZW dictionary satisfy the two 
properties: 1) If a string X is in the dictionary then every 
prefix of X is also in the dictionary. 2) For every code sent 
by the encoder, a new entry is added to the dictionary. Since 
the dictionary size is finite and may be limited for practical 
reasons, the dictionary may fill up fast. The LZW algorithm 
then continues by encoding according to the existing 
dictionary without adding new entries to it. Experiments show 
that after a certain time, a significant decline in the 
compression ratio may be observed. This decline is typically 
due to a change in the properties of the text so that the 
dictionary is no longer appropriate. At this point the LZW 
pegorithm forgets the old dictionary and starts from seratch, 
usually obtaining again a higher compression ratio [{Ref.19]. 

It is helpful to look at the representation of the 
dictionary aS an ordered labeled rooted tree. Each edge 
emanating from a vertex is labeled by a character of the 
alphabet. A vertex represents the string obtained by 
concatenation of all the characters along the path from the 
root to the vertex. Thus all vertices on the path from the 
root to a vertex representing a string X of the dictionary 


represent prefixes of X and their corresponding strings are 


“pik 


also in the dictionary. Using this tree representation, if the 
string of a vertex is deleted then the strings of all its 
descendants must also be deleted. Note that when the 
Gictionary is full, the degree of a vertex is equal to the 
number of times the corresponding entry was sent. Hence a leaf 
represents an entry which was inserted into the dictionary but 
was never sent. Depending on the nature of the text and size 
of the dictionary, a commercial program called COMPRESS 
written in 'C' language and based on the LZW algorithm yields 
compression ratios of up to 60%. The "compression ratio" is 
defined as the difference between the number of characters in 
the original text and the compressed text divided by the 
number of characters in the original text. 

The dictionary constructed by the LZW algorithm contains 
variable length strings of consecutive characters from the 
text. Compression is obtained due to the replacement of the 
text strings by the index to the corresponding dictionary 
entry. For example if the dictionary size is ce it can encode 


any string in the dictionary using just 10 bits [{Ref. 19]. 
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III. COMMERCIAL OR PUBLIC ALGORITHMS 


A. AN OVERVIEW OF COMPRESSION SOFTWARE 

As MS-DOS became the dominant operating system of personal 
computers, data storage capacities also increased. Hard disk 
drives with capacities of over 40 Mbytes became commonly 
available. Additionally, the 1200-Kbit/second modems are now 
available for less than $1000. Despite these advances in data 
storage and data communications, the sheer volume of data 
files continues to outpace the new technology's ability to 
provide adequate storage. 

With MS-DOS, the necessity for new data compression 
softwares become evident. The first important application was 
System Enhancement Associates! (SEA) ARC, which for many years 
was the popular program for data compression. Like many other 
DOS compression programs, ARC was shareware: software 
distributed through the online community without charge 
Meer.12). 

Continually, better programs have been introduced - 
notably PKware's PKARC and PKZIP - and SEA's ARC lost its 
dominance in the field [Ref.12]}. 

Today there are at least half a dozen MS-DOS 
archival/compression programs. PKZIP 1.10 may be the fastest 


and most efficient of these programs, though NoGate 
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Consulting's PAK 2.6 also offers outstanding performance. 
LHARC 1.13C, a popular compression program originated in 
Japan, [Ref.12) is almost as good as PKZIP except it runs 
Slower than PKZIP. 

Another notable program is ZOO 2.01 [Ref.12]. Using a 
Lempel-Ziv compression algorithm, it was developed by R. Dhesi 
[Ref.23]. ZOO 2.01 neither runs fast nor compresses as well as 
other programs; its compression ratio for text files is about 
10% less than that of PKZIP. However, it has some unique 
advantages. Originated in Unix, it has since been ported to 
nearly every operating environment [Ref.12]. 

There are still many problems related to data compression 
that remain to be solved. For example, error detection and 
error correction are not incorporated in most software 
packages. 

Every time one compresses a file using a package, the 
package will confirm whether the compressed file has lost some 
of its data or not. Both compressed and uncompressed files can 
fail because a disk has marginal sectors or because of some 
"accident". If the file contains executable code, there's no 
point in fixing it - one can simply restore it from a backup. 
But if the file contains data, it is often possible and 
worthwhile to recover the rest, even though a few bits or a 
sector may be missing. When a compressed file goes bad, 
recovery is harder. Since the file is compressed, the damage 


is multiplied. Naturally, the compression program should have 
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a decompress function; otherwise, there's no way one can 


recover the file back to the original format. 


Table IV Comparison of Software and the Algorithm Employed 


Algorithm 
Shrink 
) 
ZW 
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Table IV summarizes the algorithms used by each software 
package. The algorithm used by StacPack was not disclosed by 


the company. 


Bu GENERAL DESCRIPTION OF EACH SOFTWARE 
1. PKZIP 
This is one of the commercial compression techniques 
that is widely used and known. Version 1.1 composed by P. 


Katz, PKWARE Inc., uses a proprietary dictionary-based scheme. 


“ine 


One must have PKUNZIP to extract compressed and archived 
files. This version claims to be faster in compressing very 
large files and exhibits good compression efficiency. 
a. Compression Algorithm 

PKZIP has 3 different kind of compression 
techniques: Shrinking, Reducing, and Imploding. As mentioned 
in Table IV, they employ several algorithms such as LZW, LZ77, 
and Shannon-Fano coding. 

Shrinking is a Dynamic Ziv-Lempel-Welch compression 

algorithm with partial clearing. The initial code size is 9 

bits, and the maximum code size is 13 bits. Shrinking differs 

from conventional Dynamic Ziv-Lempel-Welch implementations in 
several aspects: 

1) The code size is controlled by the compressor, and is not 
automatically increased when codes larger than the current 
code size are created (but not necessarily used). The 
decompressor should not increase the code size used 
until the sequence 256, 1 1s encountered. 

2) When the table becomes full, total clearing is not 
performed. Rather, when the compressor emits the code 
sequence 256,2(decimal), the decompressor should clear all 
leaf nodes from the Ziv-Lempel tree, and continue to use 
the current code size. The nodes that are cleared from the 
Ziv-Lempel tree are then reused, with the lowest code 


value reused first, and the highest code value reused 
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last. The compressor can emit the sequence 256,2 at any 
time [Ref.8]. 

Reducinquscmead COMoimMarion Of two distinct 
algorithms. The first algorithm compresses repeated byte 
sequences, and the second algorithm takes the compressed 
stream from the first algorithm and applies a probabilistic 
compression method. The probabilistic compression stores an 
array of 'follower sets' S(j), for j=0 to 255, corresponding 
to each possible ASCII character. Each set contains between 0 
and 32 characters, to be denoted as S(j)/[0]J,..-.,S(j)[m], where 
m<32. The sets are stored at the beginning of the data area 
fer a reduced file, in reverse order, with S(255) first, and 
S(0) last. The sets are encoded as 
ae St) O77... .,S(I)ING)=-1) } where N(j) is the size of 
set S(j). N(j) can be O, in which case the follower set for 
S(j) is empty. Each N(j) value is encoded in 6 bits, followed 
by N(j) eight bit character values corresponding to S(j) [0] to 
mae N(})-l} respectively. If N(j) is 0, then no values for 
S(j) are stored, and the value for N(j-1) immediately follows. 
Immediately after the follower sets 1s the compressed data 
stream. The compressed data stream can be interpreted for the 
probabilistic decompression [{Ref.8]. 

Imploding is actually a combination of two distinct 
algorithms. The first algorithm compresses repeated byte 


sequences using a sliding dictionary. The second algorithm is 
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used to compress the encoding of the sliding dictionary 
output, using multiple Shannon-Fano trees [Ref.8]. 
b. General Format of Zipped File 
When we look at the list of archived files, there 
are Length, Method, Size, Ratio, Date, Time, CRC-32, Attr, and 
Name. Those factors show the general format of PKZIP. The 


overall zipfile format is 


[local file header + file data]... 


[central directory] end of central directory record 


Local file header is composed of 30 bytes of fixed 
factors including compression method, variable size of 
filename, and extra field. The structure of the central 
Girectory is 46 bytes of fixed factors including file comment 
length, variable size of file name, extra field, and file 
comment. End of central directory record consists of 22 bytes 
of fixed factors including end of central directory signature 
and variable size of zipfile comment. 

The Length is the compressed size of each file. The 
compression method is dependent upon the characteristics of 
the data file. The file is stored only when it does not need 
compression or can not compress. The data and time are encoded 
in standard MS-DOS format. CRC-32 algorithm was contributed by 


David Schwaderer and can be found in his book "C Programmers 
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Guide to NetBios" published by Howard W. Sams & Co. Inc. For 
every file put in an archive, CRC (Cyclical Redundancy Check) 
is calculated and is recalculated when the file is extracted. 
It 1s done due to the necessity of ensuring data integrity 
when archives are transmitted over communication links. The 
lowest bit of internal file attributes confirms whether the 
Gata file is ASCII or binary. The size of the entire .ZIP file 
header, including the file name, comment, and extra filed 
would exceed 64K in size [Ref.8]. 
ee Stacrack 
a. PEBBackup wT cogram 

Sac liggyoes provides Kormacker” package Lor 
compressing disk files in real time. This company also 
provides data compression integrated circuit chips. The core 
of the '‘'Stacker' iS a compression program StacPack and a 
decompression program StacUnpk. This program is also licensed 
to vendors that are in PC backup business. The backup 
routines in such popular DOS programs as Norton Backup and PC 
meets are builse on SitacPack's algorithm [Ref*12]}. 

| oy Ss OE Bie ans 4", 

StacPack's algorithm has proven to be so successful 
that the Quarter-Inch Cartridge (QIC) Consortium has adopted 
it as a standard, known as QIC-122, for QIC tape drives. With 
StacPack, tape backup units, such as Colorado Memory Systems' 


(CMS) Jumbo 250 and Tall-grass Technologies' FS 150e, can more 
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than double their storage capacity. Using StacPack, low-end 
DC-2000 tapes, which normally hold only 40 Mbytes of data, can 
store up to 80 Mbytes on a single tape. File server owners 
can pack away 250 Mbytes on DC-2120 tapes that can otherwise 
manage only 120 Mbytes. 

Stac's method of data compression avoids the 
disk-bound penalties of most DOS software, but it still slows 
system performance due to the stealing of clock cycles. 
Despite this, Stac's software speeds backups since the time 
lost by compressing files is more than made up by the time 
gained in writing smaller amounts of data to tape [Ref.12]}. 

3. “Compress 
a. MS-DOS Ported Compress 

This is the MS-DOS ported version of UNIX 
'compress', by Tsai, which uses adaptive Lempel-Ziv coding. 
The original UNIX ‘'compress' utility was written by Siam 
Orost[Ref.15]. COMPRESS is a 16-bit LZW implementation in UNIX 
operating systems. The PC implementation that uses 16 bits 
takes up about 500K of RAM [Ref.21]}. 

b. Modified Lempel-Ziv 

'Compress' uses the modified Lempel-Ziv algorithm. 
Common substrings in the file are first replaced by 9-bit 
codes, 257 and up. When code 512 is reached, the algorithm 


Switches to 10-bit encoding and continues to use more bits 
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until the limit specified by the -b flag is reached (default 
16). The bits must be between 9 and 16. The default can be 
changed in the source to allow ‘'compress' to be run on a 
smaller machine. After the bits limit is attained, 'compress' 
periodically checks the compression ratio. If the ratio is 
increasing, 'compress' continues to use the existing code 
Gictionary. However, if the compression ratio decreases, 
'‘compress' discards the table of substrings and rebuilds it 
from scratch. This allows the algorithm to adapt to the next 
block of the file. How much each file 1S compressed depends on 
the size of the input, the number of bits per code, and the 
Pesertbucion Of Common Substrings f{Ref.6]. Typically, text 
Such as source code or English is reduced by 50-60% [Ref.10]. 
Compression 1S generally much better than that achieved by 
Huffman coding or adaptive Huffman coding, and takes less time 
wemcompute (Ref.6}. 
4. ARJ221A 
a. ARJ Evolution 

ARJ version 2.21la 1S written by Robert K Jung. It 
uses the LZ77 brute force hashing algorithm that outperforms 
all other LZ77 algorithms [Ref. 14]. ARJ is influenced by the 
design of LHARC written by H. Yoshizaki. The early version of 
ARJ also adapt the idea from AROO1 of H. Okumura and some 


portion of ARJ is derived from AR source code [{Ref.14]. 
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b. General Feature of ARJ 

ARJ is prototyped in ANSI C and only uses ANSI C 
standard libraries. The MS-DOS production of ARJ has funetiieme 
of compression, extraction, CRC, and output routines iG 
assembler). For compressing, ARJ requires approximately 282 
kbytes plus the memory necessary to store all of the path 
names to be archived when using the default compression 
method. For extracting, ARJ requires approximately 166 kbytes 
plus. There is no limitation on the number of files that can 
be stored in one archive. Examining the options of ARJ, one 
may find 4 methods. Different methods come from the emphasis 
among compression ratio and execution speed. 

The default input 1S a binary mode but one may set 
the option to input text files for slightly better size 
reduction. If one use the 'text' mode for non-text files, ARJ 
will prematurely stop input if it finds an embedded EOF 
character (CTRL Z). This may produce a loss of data on binary 
files. The file type "text" is only needed for future cross 
platform transfers of ARJ archives. It enables ARJ to extract 
text files to the host file system with the text new line 
sequence that is correct for that operating system. This mode 
may produce slightly better size reduction, but extraction of 
files compressed in text mode is significantly slower than the 
extraction of binary» files. In lookingesfom S—-bit nen-teaee 
data, ARJ will look at the first 4096 bytes of the input file. 


If ARJ finds any 8-bit data, it will automatically backtrack 
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ana switch to binary mode for that particular file. In 
addition, at the end of compressing the input file, if ARJ 
finds that the input file size is not greater than 75 percent 
of the binary file size (size on disk), ARJ will report an 
Seror for that input file and increment the error count. This 
helps avoid the problem of accidentally compressing executable 
files with the text mode which results in lost data. The 
Original file size reported by the "1" and "v" commands is the 
actual number of bytes inputted during text mode compression. 
This is usually the MS-DOS file size minus the number of 
carriage returns in the file since C text mode strips a file 
of carriage returns [{Ref.14]}. 

ARJ provides the capability of multiple volume 
archives. In other words, it can archive files directly to 
diskettes no matter how large or how numerous the input files 
are. It 1S possible to archive a 10 megabyte file to several 
diskettes and to recover the file directly from the diskettes. 
Other archivers, however, require that one compress the large 
file to hard disk or large RAM drive and then slice the 
compressed file to fit on diskettes. Recovering the original 
files involves reassembling the compressed file on the hard 
disk from the diskettes and then extracting the original files 
from the reassembled compressed file. This feature makes ARJ 
especially suitable for distributing large software packages 
without the concerns about fitting entire files on one 


diskettes. ARJ will automatically split files when necessary 
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and will reassemble them upon extraction without using any 
extra disk space [Ref.3]. 

The ARJ archive data structure with its header 
structure and 32 bit CRC code provide archive stability and 
recovery capabilities. This software also provides a security 
envelope facility by way of "lock" ARJ archives. A "locked" 
ARJ archive cannot be modified by ARJ. This provides some 
level of assurance to the user receiving a "locked" ARJ 
archive that the contents of the archive have not been 
tampered with. Data integrity checks contribute to the 
security of the ARJ "lock" [Ref.3]. 

Se LHA213 
a. New Static Huffman Coding 

This is a revised version of LH113c.exe, by H. 
Yoshizaki, an archiver which was rather slow in execution but 
tight in compression ratio. This LHA software employs new 
static Huffman coding instead of older dynamic Huffman coding 
and is faster than LH113c in decompressing but requires more 
memory than LH113c introduced by K. Okubo. This has been known 
as 'LHARC' since it was introduced in 1989 [Ref.3]. 

b. General Feature of LHA 

LHA was chosen over runner-up ARJ because the 
header it attaches to its self-extracting module requires only 
1.9 Kbyte of RAM, and is highly customizable. That means the 


SFX has features that make it especially helpful for users 
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Sie trtoueing Soltware. If one restricts the type of 
compression used, PKZIP's 2.6 Kbyte is competitive, but 
otherwise, the overhead in competing programs is 3 times as 
great or more. LHA requires 384K plus the RAM [Ref.3]. 

This technique also is set so as not to compress 
meee the files with extensions, -.ARC, .LZH, .LZS, .PAK, .ZIP, 
.Z00, which are partially or fully compressed already. 

6. PAK251 
a. Distilling and Crushing 

This software uses the compression type of 
'Distilled' and 'Crushed' among 12 compression types: 
Crunched, Squashed, Shrunk, Crushed, Imploded, Distilled 
'Distilled' employs the Huffman coding and Sliding Window 
e277) while "Crushed' employs Lempel-Ziv algorithn. 

b. General Feature of PAK 

PAK is intended as a replacement for ARC by System 
Enhancement Associates and PKARC and PKZIP by Philip Katz 
[Ref.15]. While PKZIP 1.0 files are roughly comparable in size 
to PAK files, PAK supports multiple compression, more archive 
formats and features. PAK creates and modifies archive files 
which have the .PAK, .ARC, or .ZIP extension. Files in an 
archive retain all of the information they had in the 
directory, such as name, size, and date. In addition, each 
file in an archive has a calculated CRC number, which assures 


the detection of damage after events such as file transmission 
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Via modem. The basic format of PAK has 1 byte of marker, 1 
byte of version, 13 bytes of name, 4 bytes of size, 2 bytes of 
data, 2 bytes of time, 2 bytes of CRC, and 4 bytes of length. 
Basic archives end with a short header, containing just the 
marker (26) and the end of file value (0) [Ref.15]. 

PAK has a wide array of extra features that 
includes comment writing, password protection, and a security 
envelope. PAK's optional command shell makes use of pop-up 
windows [Ref.15], which still is the most pleasing interface 


among any of the six programs evaluated here. 
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IV. PERFORMANCE ANALYSIS OF COMPRESSION SOFTWARE 


A. EXPERIMENTAL SETUP 
We define the compression ratio as the size of compressed 
file divided by the size of original file such that the 
smaller the compression ratio, the better the performance. 
Some software may use different measures for indicating the 
compression effectiveness such as 'SF (Stowage Factor)! which 
is the percentage of the reduction in file size by compression 
Meet.22). In archivingy the total Stowage Factor is the 
Stowage factor for the archive as a whole, not counting 
archive overhead. In this thesis, however, we use the 
compression ratio defined above. 
1. How Files Are Tested 

There are many ways to classify data files. Generally 
speaking, one can classify data files into ASCII type and 
begin y type. An ASCII file is a data or text file that 
contains only characters coded from the standard ASCII 
printable character set. A binary file is generated in machine 
language form and ready to be executed by the CPU. Binary 
files cannot be transmitted by protocols that handle pure 
Poeil text. 

This thesis classifies the data files into Text, 


Executable, dBASE, and Image files since this classification 
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meets the practical need of data management and transmission, 
especially in the military environment [Ref.24}. 

There are possibly many different types or formats in 
Image files: scanned picture, black-and-white image, color 
image, etc. In the compression analysis, however, they are 
all classified as Image type. 

For comparison, 3 compression methods: PKZIP, 
StacPack, and Compress, the ported version of Compress in UNIX 
to DOS, and 4 archiving methods: ARJ221A, LHA213, PKZIP, and 
PAK251, were examined. Note that the 4 archiving techniques 
also contain the function of compression. For a wide range 
comparison, files sized from 500 bytes to 1 megabytes were 
collected. The file sizes spanned over 0.5K, 1K, 1.8K, 3K, 5K, 
8K, 13K, 20K, 40K, 70K, 120K, 190K, 300K, 500K, and 800K. The 
margin of each size is 420% which made for a relatively even 
and wide spread range. To test data compression packages, a 
collection of as many files as possible were gathered; 
however, 5 sample files for each of the 15 representative 
Sizes constituted each file type. 

The files are collected from the computers at NPS. 
They are mainly files of personal computers, DOS operated, 
though some were from VAX, and SUN workstations. The total 
Size of each type of file ranges from 4 megabytes up to 10 
megabytes. In any event, a compression or archiving software 
was needed to reduce the time and effort required to collect 


and manage those files. 
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Experiments were run on a 33-MHz IBM (Compatible) 
Desktop 486 with 8 MB of extended RAM and a 100 MB hard disk. 
The hard disk was formatted under MS-DOS 4.01. Sample files 
were stored on hard disk. Furthermore, these experiments were 
conducted in the program's native(default) mode. 

2. Sample Files Classification 

Text files include word processing documents, batch 
files and source language programs and are usually ASCII files 
as they contain only letters, digits and symbols. Most of the 
files are from mathcad [Ref.25], matlab [Ref.26], wp5l 
[Ref.28], PSpice [{(Ref.27], C++ [Ref.31]. Note that although 
text files are generally human-readable, the compressed files 
are generally not. 

Executable files include machine language programs 
ready to be loaded and executed in the computer. These 
executable (binary) files may have some ASCII text in them as 
string constants. A total of more than 8 megabytes of 
executable files were obtained. There files are generally 
found with file extension .EXE or .COM. In contrast with .COM 
file, which is designed to work only in specific memory 
locations, .EXE files are designed as relocatable files and 
can reside in any memory locations. Most of the executable 
files were collected from DOS operating computers. They can be 


compressed with slightly larger (worse) ratios than text 
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files. Moreover, they need 3 times longer processing time than 
that required by ASCII text files. 

A database is a collection of interrelated files that 
are created and managed by a database management system(DBMS). 
In the following discussion the word 'database' implies DBASE 
IV because it is the most widely used database system for 
personal computers, and its programming language and file 
formats have become industry standards. Additionally, DBASE is 
widely used in the U.S. Navy; therefore the compression 
effectiveness of ABASE files should be studied separately. 
Database files are uSually not ASCII files since they contain 
numbers in integer or floating point forms and many control 
codes for tabulating purpose. 

Due to the difficulty of obtaining a sufficient number of 
ABASE files, some files are acquired from the example files of 
GBASE IV, some files are purposely composed for different 
Sizes, and some are obtained through the ftp (file transfer 
protocol) over internet from public domains. 

Computer graphics and image processing applications 
create and process digital images. Images can be generated or 
sensed before they are stored in computers. For storing and 
maintaining pictures in a computer, images are represented in 
either vector graphics or raster graphics. When circuits are 
drawn in CAD (Computer Aided Design), vector graphics is used. 
As one draws, each line of the image is stored as a vector 


(two end points on a two dimensional matrix). Vector graphics 
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maintain the image as a series of lines. Unlike vector 
graphics, raster (binary) graphics is used when objects are 
"ynainted" on screen or are scanned, typically from 16 to 256 
levels of gray levels, into the computer. It is similar to 
television where the picture image is made up of dots 
(pixels). 

The 10 megabytes image files are collected including 
CAD files [Ref.29], adrawperfect sample files [Ref.33], 
business graphics files which yield graphics-like bar or pie 
charts, or scatter diagrams, files from commercial games, and 
some Black/White and some colored images. Like Executable 
files, Image files may include some text descriptions that 
provide charts, tables, and special characters. Through ftp 
some larger sized files (above 300K) were downloaded from 


MOerous Universities and institutions. 


B. EXPERIMENTAL RESULT ANALYSIS 
1. Text Files 

Fig. 5 shows the average compression ratios of PKZIP, 
StacPack, and Compress on collected text files. PKZIP ranks 
best when applied to text files. 

Text files in the range of [10K, 100K] benefit the 
most since the compression ratios are lower than those of 
other files sizes. PKZIP stood out as 21.4% at 190K. Looking 
at each sample file (See Appendix A), one finds a PSpice 


library file sized 135K was compressed to 73 of its original 
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Fig. 5. Compression vs File Size, Text Files ( Compression 
Op oul 


Size using PKZIP. This is no Surprise since there are many 
blanks in the library file. PKZIP's average ratio was 363, 
StacPack was 43%, and Compress was 50%. 

Fig. 6 is the comparison among 4 packages mentioned in 


section IV.A. One ebserves J]imetlec difference fremmthe mea 
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Fig. 6. Compression vs File Size, Text Files ( Compressed 
& Archived ). 


However, ARJ221A stood out as the best and LHA213 was a close 
second. Good compression ratios are spread evenly between 10K 
and 200K which is consistent with the findings in Figure 5. 
Notably, for small size files, one does not find good ratios 
because the overheads of the software packages are too 


overwhelming. Comparing with PKZIP, the PSpice file at 135K 
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Fig. 7. Compression vs File Size, Executable Files 
( Compression Only ). 


was compressed to 6.2% by ARJ and LHA. The overall ratios of 
packages were 31%, 32%, 36%, and 34% for ARJ, LHA, PKZIP, and 
PAK251, respectively. 
2. Executable Files 
Fig. 7 and Fig. 8 compare the compression ratios of 


Executable files among 6 software packages. The curves show 
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much more peaks and troughs than text files. However, the size 
ranges between 30K and 300K is a most stable range with better 
compression ratio than the other ranges. As sample size grows 
in archiving, ARJ is better than LHA, and PKZIP and PAK251 are 
tied. Additionally, one recognizes that 'Compress!' does not 
perform well for .EXE file compression. Notably, PKZIP 
compressed PKUNZIP.EXE file to 77%, ARJ and LHA to 74%, but 
Compress shows an expansion or 102% of its original file. 
PAK251's 7.6% ratio for a 1.1K gen41l.exe is the smallest 
ratio. Average ratios of each package was 51% for PKZIP, 56% 
for StacPack, 76% for Compress, 48% for ARJ221A, 49% for 
mieaZi3, and 49% for PAK251. 
3. dBASE Output Files 

Fig. 9 and 10 show the curves that are somewhat linear 
as the size grows. That is because when the file size grows, 
the amount of overhead or format has little difference with 
that of small size file. Sample sizes between 20K and 500K 
show the most useful range of dBASE Output File size to get 
the smallest value of compression ratio. iP o, abter 
10K, Compress 1S approximately 10% better than StacPack, and 
follows closely to PKZIP. In Fig.10, PAK251 also outperforms 
@ver PKZIP after 20K. 

The smallest ratio from GBASE Output File is 12.33 at 
mwot'quad.dbf'. This file contains accounting information 


of personal names and addresses. Average compression ratios of 
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Fig. 8. Compression vs File Size, Executable Files 
(Compression & Archived). 


ABASE files are 22% for PKZIP, 29% for StacPack, 24% for 
Compress, 18% for ARJ, 18% for LHA, and 19% for PAK251. 
4. Image Files 
Curves in Fig. 11 and 12 show V-shaped plots except 
for abrupt jumps at 70K range. This might be because of 


'bv.sr' and 'bfg.sr', which are Black/White normal pictures. 
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Fig. 9. Compression vs File Size, ABASE Output Files 
( Compression Only ). 


Except for the 70K cases the results show that the files size 
between 10K and 100K are benefit most from the compression. 
Graphics users must note that some image files are 
resistant to the compression algorithms. For instance the 
gray-scaled .GIF image files have 100% to 132% compression 


ratios. This indicates there is some overhead generated by the 
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Fig. 10. Compression vs File Size, dBASE Output Files 
( Compressed & Archived ). 


software package. If one needs to compress those files, 

it is necessary to change the format from .GIF to .PCX or to 
whatever is compressible. It is noted that one can convert 
-.GIF to .PCX format (with some expansion) and then compress 
the .PCX files. By doing this one can have a net compression 


ratio of less "thane 
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Fig. 11. Compression vs File Size, Image Files 
( Compression Only ). 


ARJ and LHA remain as the best compression software in 
compressing image files. 'Scree.rf' at 40K has a compression 
ratio of 6% which is the best from the experiment by ARJ and 
LHA. Each of ARJ and LHA has its own favorites; for example, 
'bdy2.cbd' at 190K by ARJ was 6%, but 48% by LHA. The overall 


compression ratios were 51%, 56%, 58%, 46%, 47%, and 52% by 
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Fig. 12. Compression vs File Size, Image Files ( Compressed 
& Archived ). 


PKZIP, StacPack, Compress, ARJ221A, LHA213, and PAK251, 
respectively. 
5. Overall Performance Analysis 
'Compress' shows the worst capability in Executables, 
but better than or close to StacPack in dBASE and Image files. 


PKZIP had the same average compression ratio in Image and 
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Fig. 13. Compression Ratio Comparison ( Total Compression 
of Each File Type ). 


Executable files. Besides, one has to recognize that the .ZIP 
file format is the current standard in the data compression 
world. ARJ and LHA have kept steady low compression ratios in 
most kinds of file. ARJ proved slightly more effective on 


every 


a 


Table V Compression Ratio Comparison 


__StacPack 


- Compress 








type. However, they are only 1.3% ain Text, 0.6% in 
Executables, 0.3% in ABASE, and 1.7% in Image files. LHA gets 
the nod over ARJ because the header it attaches to its self- 
extracting modules is both the smallest among the six programs 
(1.9K) and the one with the most potential for customization. 
If we use < to indicate the relative compression ratios, then 
ARJ < LHA < PAK < PKZIP < StacPack < Compress. In other wom 
ARJ outperforms the others. Using the self-extracting 
technique allows the sending of compressed files to a party 
who does not have any utility to decompress them. 

The files compressed by PKZIP were mostly '‘imploded' 
which employed LZ77 and Shannon-Fano coding. With this in 
mind, considering the algorithms of good performing software 
packages, one can conclude that LZ77(SW), Huffman, and 


Shannon-Fano create the least compression ratio. 
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Fig. 14. Execution Time Comparison (Compressed & Archived). 


Figure 13 shows that the dBASE files can be compressed 
the most in comparison to other file types. Notice that the 
binary files, Executables and Image files, have the highest 
compression ratios. 

Although the differences are slight, some products 
outperformed others in compressing particular types of files. 
ARJ was best at compressing ASCII and executable files, while 
LHA realized the most out of the graphics formats. PAK251 is 
better than PKZIP in compression ratio except Image files, 


although the difference is a mere 0.6%. 
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Table V shows the general compression ratio for 4 file 
types and 6 packages. As one see, ARJ ranks at the top in all 
file types, and Compress the last. It is also shown in Figure 
14 for clarity et comearison. 

Figure 14 shows the execution time of 4 archivers. One 
cannot see big difference among softwares up to 1 Megabytes. 
However, PKZIP on a 33-MHz 486 with a hard disk of 18ms access 
time took 44 seconds to compress and archive 2 Mbytes of 7 
sample files. In the same environment, ARJ took 17 seconds 
more and LHA took 8 seconds more than that of PKZIP. On the 
average, PKZIP is the fastest product. LHA and ARJ, the best 
compressors, still lagged behind the leader in speed. Details 


are shown in Appendix C. 
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V. CONCLUSION 

Archiving and data compression utility programs allow 
users to store data files in a highly compressed form, which 
conserves storage space and improves telecommunication 
services. Archiving utilities also permit groups of files to 
be stored together ina single ‘archive' file. Single files 
are eaSier to move, copy, store and manage than are ad-hoc 
Semlecttons of individual files [{Ref.3]. There is no 
distinction between compression and archiving for softwares 
meat provide archiving only. 

Be kiecrent information queries Oh wearehrved and/or 
compressed files without unbundling the entire file systems is 
one important area for further research. 

It 1S believed that compression will play a greater role 
in the future of personal computers and data communication. 
This iS particularly true in multi-media applications where 
large amount of information have to be transferred and stored. 
However, that may require irrecoverable compression. 

While data compression iS not appropriate for every 
application, nearly 30 years of research on the subject has 
demonstrated that there are ample areas for research. It is 
valuable in data processing for efficient data transfer and 


storage. 
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As all the techniques have developed, we see now that data 
compression has become a part of routine data processing and 
communications. There are still many problems related to data 
compression that remains to be solved. For example, error 
detection and error correction are not incorporated in most 
software packages. A major use of data compression today is in 
communication systems. Compressing a message reduces the time 
and cost of sending it by an amount often egual to the 
compression ratio. Several popular softwares for data 
compression and archiving have been investigated and applied 
to files collected at NPS. The results show, in general, PKZIP 
is the fastest and ARJ221A has the best compression ratio. 
Therefore ARJ221A archives relatively the best. The details 


are reported in Chapter IV. 
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APPENDIX A. RESULT OF EXPERIMENT FOR COMPRESSION SOFTWARE 
imaote 1 COMPRESSED, TEXT FILES See ch 


** For various sizes of Text Files, Compressed only 
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*x For various sizes of Text Files, Compressed only 
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20K thes1.adoe 17408 "e207 55 ./ ees 4244-9232 54.2 
16K- arrow.doc 21582 10226 «47.4 10699 49°50 16224 3372-4. 
24K cshel.doc 24911 “777 31.2 2O10G 4056 cee 38 ie 
redu.c 21931 4820 224 7) men SZ 30.9 7201) 32m 
SD lide 2 Aye Oe 32.60 /9e5 Se Sales 3G 
Avg Ps Say SI 7) 2) 3) 33°. 1859 | 40.0 10039 46% 
AORKIGiia ce Clee mene 23 13881] 32.9 16665 3925 2326 ae 
32K- matla.hlp 50425 20556 40.8 252/77 “S0]!) 262603 
48k setup.inf 50014 12898 25.8 15882 3178924094) 0445 
eval.lib 52515 15614 29.7 21426 40°78 27242520 
parts.hlp 33583 9424 28.1 12721 37.9 13855 igre 
Avg 45752 14435 31.6 18394 40.2 22984 (aime 
70K holid.doc 55584 31797 ~57.2 323152 95936842 tc.) 
56K—8) Mead. hile fo4 13639 25.6 19123 36707517 e430 
84K check. hlp 52616 17631 33.5 23455 4486 21757 see 
util.doc 79144 24687 31.2 320/738 40%5 33434007 
Clase 2aoc 55736 13952 "25 20Mio tea 34.4. 592 3 5a 
Avg 59253 20341 34.3 25398 42.9 28020. eee 
120K qbasizthlp 130810 130810 100. 99232332 9235 155277 ee 
96K- anlg.lib 138727 18629 13.4 61036 44.0 48665 aoe 
144K texe lb 1316533 ero 7.7 16533 12.6 32664 eee 
Ehy © il Posie 5 346 uc 7.0 26053 195.9 27 153Gaee cee 
lin.1lib 110682 14313 12.9 24944 22.5 3606s 3277 
Avg 129444 36660 28.3 48180 37.2 60115 .04c 
190K ssims.mdr 212493 23513 11.1 372391 1776930370. 
152K-eval2.dat 159201 98536 61.9 TQ1937 6470 Nia ae 
228K bipol.lib 185420 25906 “14.0 3997/7252 1) 6%e.0 eee 
Gdiode.lib 158181) 22716 14.42 GiGo2 19.4 44552 28.2 
pwr.lib 184757 22377 12.1 28532 (¥Se4a4s25 oye 
Avg 180010 38610, 21.4 47545 @26.4 692758 3a 
300K quatt.hlp 287589 104755 36.4 26345 42290 (ivr. 

240K- 

300K Avg 287589 104755 36.4 126345 43.79.1400 te. 


Sys, 


** For various sizes of Text Files, Compressed only 


Piemonte xt 45og74 147912 3256 197406 43.4 173423 38.2 
400K- 
600K Avg Poo 74 14792 32.6 197406 43.4 173423 38.2 


Pemecmel ten 976250 555033 56.9 590872 60.5 704621 72.2 
640K- 
eGOK Avg Ee oO oe eso. 2 290872 60.5 704621 72.2 


i@ietienne ,0G7 706 1,463,707 ila) Shemebe) S: Zeeol > 7 oS 
Rate © 100 % 56)..0) 2.1% 49.6 % 


Sie, 


Table 2 COMPRESSED, EXECUDAEew TEES —"Pica? > 


** For various sizes of Executable Files, Compressed only 


File Size Execute PKZIP StacPack Compress 
0.5K isat.exe 568 104 Seas. 9a L6G. EOS 19 
400= Ghkri.coen 66e 604 87 yee 5 SiG San2 6216 9 las 
600 rambi.com 307 268 S i eaeecs 2 Tomo - 27m Somes 
exetl.com 413 413 100 we oc 96.1 ai 10 
fasto.exe 680 ENS ee 207, 3075 224 32 
Avg ay) 1s 321 60. Sas. oe a eS, G2 740 
1K egaep.com 1006 665 66./ 5" Ges Ooree 759 7 oe 
800- gen41l.exe 1125 Ike 10.4 120 LO 2. 12 Li 
1200 leadt.com 1ian 60" 53.7 GOe See Gee 6 se 
prtsc.exe 1176 419 35.6 4k6 3554 460 397 
Avg p29) 452 40.7 448 AQeSs Sino 4529 
1.8K Curso. cem, lasZ Oe logs Sl. rleiee S lsO) 1S ekG e071) 
1440-gen42.exe 1477 176 Aes es oes a 1235 220s 14.6 
2NGO6G7Ves. com Los eee 64 oe. 638 Ibee 76a 
runti.exe 1590 7 3Be 47.7 766 48.2 Si 5 1G 
dbase.exe 1588 754 ag. 52762 48.0 808 502 
Avg 533 72a 5025 776 50". 69 Soe 5 676 
3K egala.com 2388 gee) fe A862 1270 49.0 67s 6 ie 
2400- more.com 2618 2044 185.0 2056 TB 6s 250 5G 


3600 appen.exe 2902 2902 100. 207% 106. 3574 iZ3e 


setna.exe 3174 Lory 62.3 1977 62.9 2303 12a 
astel.com 2557 1796 70.2 ~BSne7 Fiat Zoe 82.6 

Avg 2726 1974 (2 20S 7420 2362 S66 

5K edit.exe 4837 2654 7525 33272 6/:/7 32305 1 oa 
4K- strid.exe 4837 gies 65258 3327 2 67.7 3805 1Seg 
6K shell.com 3894 1072 2/7295 2s 2624 263 32a 
grep2.exe 5934 SG 61.8 3753 63.3 4570 7 Tee 
Couch. com 5 lake Bey 65.4) 343m 67.0 4002 782 

Avg 4924 2985 60.2 (2747 55.8 saeco 70m 

8K stup.exe 7520 e410) 2 71.8 S566 1 2 9 Gos 9245 
6400-wpinf.exe 8192 63.57 835 7/7) Se82 65.1 6Giy SOs 
9600 patch.exe 6788 AS si: 67.7 e469 69.3 Bbeag Sow 
grep.com 7029 4599 65.2447 ek 67.0 57090 S leeeZ 
tasm2.exe 6984 4064 58.2 4194 60.1 5225 TAS 

Avg 7308 Se Om 69.8 4897 6721. 6065 83.0 

13K Grab..com 15842. 7508 AGO 52.9 4479. 79 ieee 
10.4—= Mips. com, 12 saz sel 40.8 5962 44857492 56.a: 
15.6 check.exe 10043 6588 65. 666811 67 76 Bends 8 lee 
share.exe 13424 7508 Dae 7826 58.3 Jizew aor, 1 
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xx For various sizes of Executable Files, Compressed only 


—— ae ae ae ee a 2S GS ae as as as a ae eee eee cee eee eee cee eee eee ae see eee eee iow ew ew ew ea ae SP ee ee ee ee ew ew oe ee ew ewe eo = oe = oS Se eS ee ee Se ee ee ee ee ee 


File Size Execute PKZIP StacPack Compress 
eilewexwe  1OGD9- -7/560 Or Uo oi®, Po. 9463 88.8 
Avg IAG BG ast 95.3 Gib 3 ao 9770 eee 
PeiemonuZ.exe 235238 2el25 77.0 18829 80.0 24075 102. 
Moles reaue.exe 16505 121153 67.6 11610 70.3 15326 92.9 
24K st.exe 17184 mrbo6 764.6 l>96 “67.5 168938 98.3 
meee emma, Ole ilrZ46 67.8 T1700 70.1 15514 92.9 
mcstr.exe 16395 8645 See) oT |: ao26 LIiZza4 68.6 
Avg Meigs Zeol  6G6.0 12569 69:6 1661] 92.0 
40K Waeeme Xe 540296 25052 74.5 26604 74.6 34170 99.6 
ok iivewe <emoazos § 25096 73.2 26lll 76.2 32687 95.3 
48K Glemtwexe 45212 22860 50.6 24762 54.8 31658 70.0 
copy.exe 42398 19889 46.9 21428 50.5 28341 66.8 
Site exemo19e4 924071 57.3 25429 60.6 31974 76.2 
Avg 39635 23494 Domo oor 62) 31766 80.1 
Perwenera.exe 59208 34620 58.5 36793 62.1 47949 81.0 
Bem— Natn1.exe 75915 41549 54.7 44976 59.2 54061 71.1 
84K Sieere-weectg22 42471] 50.4 46244 54.8 58349 69.2 
Gig@isa.exe eoeo4 40895 48.8 44796 53.4 56723 67.6 
ie@ectenexemsc742 27938 47.6 30722 52.3 42366 72.1 
Avg Freee 7 20S oles 40706 56.2 51890 71.7 
nek COonve.exe 105141 59625 56.7 64078 60.9 87955 83.7 
96K- graph.exe 107520 70670 65.7 74884 69760 -b0zZ937 95.8 
144K ieee xemoso7> 49929 53:5 53340 57.1 67811 72.6 
iimene emit 4 22043 51927524919 22.3 28198 25.2 
Avg Moore 0567 Set8.4 54305 52.0 71/738 68.7 
190K desig.exe 175292 80785 46.1 87466 49.9 140659 80.2 
Meme emoasmeexe 972774 938509 49.9 105393 53.4 151549 76.8 
Peek Gdisp.exe 179560 63846 35.6 70165 39.1 96975 54.0 
dsi.exe 167734 61394 36.6 67199 40.1 91795 54.7 
exec .exenl723588 69147 37.8 70683 41.0 100567 58.3 
Avg iiWeeeon,; 3256 42124780181 44.9 116309 65.2 
300K mcad.exe 289664 142159 49.1 153675 53.1 224254 77.4 
Pome Check.exe 351232 170400 48.5 183369 52.2 253029 72.0 
prem Coroc.exe 376486 159581 42.4 176076 46.8 257589 68.4 
wcSim.exe 284184 107365 37.8 118984 41.9 181342 63.8 
Megwrmexeez73432 125300 45.8 137224 50.2 195168 71.4 
Avg Se O00O e056) 44.74l53866 48.8 222276 70.6 
Bek emMac3e8.exe 428768 209473 48.9 224145 52.3 303367 70.8 
pe eh-Oppsc.exe 416612 1888e3 45.1 204706 48.9 284042 67.9 
600K stmed.exe 548640 262525 47.9 284796 51.9 408439 74.4 
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*x For various sizes of Executable Files, Compressed only 


probe.exe 543952 


Avg 


800K pshel 
640K-pspic. 
960) som 
Grate. 

Avg 


Total 
Ratio 


Pe =o 


exe 
exe 
exe 


484993 


6355 52 
781504 
887104 
644029 
Toa 7 


8,576,430 


TOO. % 


244485 
226342 


3H 3 3 1 
3192518 
467688 
644029 
441392 


il 
ae 


2OOe 


Die. 


4,405,559 


oles 2 


° 
© 


62 


2 


2607 96 
245111 


325990 
Bo G20 5 
SOG0 10 
Seeoo ) 
476474 


49 


SOs 


ple 
49. 
Die 


20 
5 


4 
4 
O 


POae. 
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4,757,878 


55.5 


° 
© 


as 


385640 
345372 


454701 
Se 7 O2 
645948 
907302 
6321663 


~ 
O 
oon 


Tae 
141. 
S6450 


69537, 7g 


Tope 


° 
© 


Table 3 COMPRESSED, dBASE Output Files a. 9 > 


** For various sizes of ABASE Output Files, Compressed Only 


File Size dBASE ERA E eee Gea k Compress 
Meek stokn.dbf 640 332 59.7 367 bia OSL 5O 5 
400- sysid.dbf 418 190 a5. 5 phy 5 42.0 204 48.8 
600 syst1.dbf 427 166 38.9 149 S10. Dulka BO. 
trans.dbf 640 201 a7, O29 Tee > Oe Sy a8 
Avg Bo | 260 49.0 244 GeO. 2S 5s. 3 
1K sales.dbf 894 342 6a, 3.8s42 Sigua 3S 3-7 > A & 
eeOo- stokp.dbf 896 A436 Ao. | eae Aa Ges) SOS 
lmaoo acctr.abf 1280 ral cal ls S420 463 6-2 55 1 Aoo oO 
codes.dbf 1152 B32 AG. 2 eo 7 AVS. al Weyl S. AV 
items.dbf 893 346 38.7 52 Soa ooo eel) 
Avg 1023 420 Ad. 1 7427 41.1 469 ANS ars: 
ieee clien.dbf 1664 849 lien Oem oes Sie (Crom 52.9 
metO= Ccust.-dbf 2048 or A2 oe 7 eo Aa > FOR 48.4 
2160 peopl.dbf 2048 984 18. Gplos 3 50.4 985 Ae 
systa.dbf 1539 449 29.2 2478 oa 4 27 353 
Sstock.dadbf 1664 585 35.2 "GOo72 36.2 804 4223 
AVG lee 3 3 748 Ae ee Aeoe Do. 4 MF II 
em Conte.abf 2304 934 AO 5 297.0 AD eA OOS eas: 
2400-custo.dbf 2666 153756 SOs) 1 2 536) ASO 56.3 
Biooo inven.dbf 2371 823 34.7 253 3620 “1125 a7 4 
hal3k.dbf 3268 483 a5i.4 157 5 vous  loe 46.5 
Bie cot 3202 997 Bae. BPO 6 / 3 17 Bs 0 
Avg 27 G2 1119 Oc 5: sees 5 Ae: 1292 AE 18 
Bee GOOOod.dadbf 5120 1144 Z2.. 3) BES 50 2G 4) Sos oes 
4K- names.dbf 4096 2163 Baw meee | Bistie/ 122 oe Sy aa | 
6K sysco.dbf 5586 959 7s 2 kOS loc  ES06 27 iO 
dba4.dbf 4969 Ss 2 Ce eer cal 340" 2207 A4..6 
haS5k.dbf 5296 2207 a, 7 e2475 46.7 2298 43.4 
Avg S013 L605 61.9 Fi 30 Siar). US 6 3976 
8K syscl.dbf 7831 ileal 1S. Ses Oo 2 Once 2 b2o OFT) 
6400- dba2.dbf 7842 2S 7 Ze. 9 pew OG B45. 33 17] 402-5 
$600 8k 1.dbf 8194 1924 23657 2296 Zee 2558 Salen 2 
hal&8k.dbf 8260 BZa3 S92 748 760 AS 516 aes CIOS A Ah 
oleeeaabt 194 1888 232202265 Die oe Salis 
Avg 8064 ZALES O 26 come oS 31.2 2806 eal es) 
few emolo.dbf 12288 3615 29.4 4412 ys) SI! yagi) ie os OS 
10.4- dbal.dbf 12639 3339 26. coe 4 ot 6 93 734 2925 
i -owottic.dadbf 1126] 3808 33.6 74381 38.9 4831 ile. SS, 
Immo dbf Y3252 4873 36.8 wood 7 A239 = 597] 662 7 
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** For various sizes of ABASE Output Files, Compressed Only 


—_— oe ee ee ee ee ee eee es se i es ee es es es es es es ee ee es es es ee ee ee ee es es ee es ee see ee oe oe 


File Size dBASE PKZIP StacPack Compress 
gaadigdbi 13455 IMSS ese Palo a) 2. 2 Lee eo 7 23a 
Avg LZ 579 Sa / 2o. Oro Gs 34% [ae 336 
20 Kup Abas Gdbisleo245  j40e 24. DSS 28.8 weds 2452 
16K=.0f 1] 1). dbf eg 74 Wai 260.2 oes 34.8 8827 3 Das 
ofil2-dabi ZOl02 4632 23.3 86726 33.5 7US6 a5 70 
20k 27dbGe20258 “See io. 54 26.4 5is9 25-46 
h20K.anT 2027 2.7 bod SD 24 Mere s 45.3 “FE0s 37s 
Avg 19230 49568 29S Borers 3336S ee 2/ wa 
40K h40k.dadbf 40240 13681 34.0 17948 wa 76 B71 eee 
32K- ha40k.dbf 40240 13662 (~34.0° 17910 4405 Bee 2 eee 
48K 40k _ 2.dbf 40354 7389 18.3 10368 2537 3247 28 
40k_1.dbf 40354 7423 18.4 1TO40] 25578" 9ie8 Zone 
Avg 40297 10539 26.2 14157 35.1 11474 > 2ee 
7OK h70k.dadbf 70192 23419 33.4 30876 4420 225757 
56K- ha70Ok.dbf 70392 23333 "33.2 311297 4450 22557 eee 
84K 70k_2.dbf 70530 12458 17.7 17934 25°74 223695 Se 
70kK_1l.@bf 70530 12602 17.9 17897 25°43 2 
Avg 70361 17953 25.5 24459 34.73 66). eee 
120K h1l20.dbf 120190 39630 233.0 82920 44350527222 3cee 
96K- hal20.dbf 120190 39676 33.0 52929 44 0837. 
144K 120k1.dbf 120802 21242 17.6 30609 2573 3655) 3. ee 
120K2.dabf 120802 22045 “17.4 3054) Se eee Love 
Avg 120496 30398 25.2 41750 34.6 30461) 92555 
190K hl190.dbf 190156 62354 32.8 83892 4421 579). 
152K-190k2.dbf 1912170 33181 17.4 48186 25°22 25.423 
228K 190k1l.dabf 191170 33111 17.3 48034 23355022) 
Avg 190832 42882 22.5 GOQ37 3185 4313 73. 2 
300K 300K2.dbf 301762 52181 17.3 76196 2573 55325 uceee 
240K=-300k1.dbf 301762 52090 17.3 76l20 Jee? 5527 
360K Avg 301762 52136 17.3 76158 25.2 55500 eee 
500K 500k2.dbf 502818 86410 17.2 1226604 25°22 93237 5 
400K-500k1.dbf 502818 86479 17.2 26627 25-2 @eeee ome 
600K Avg 502818 86445 17.2 #26616 23.2 “2425 ee 
800K zipco.dbf 967384 304450 321.5 S45132 G55 so0s2- 2a 
640K-800K2.dabf 804450 138040 17.2) 20212 eee ete ieee 
960K 800kK1.dbf 804450 138066 47. 2 202 33 ee 
Avg 858761 193519 22.5 24986327 ee eee 

Total Bs 7. Cee 1,295,643 1° 745,878 lees 715 

Ratio HOO! -s 2 liesmec 29.4 % Pr 
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Table 4 COMPRESSED, IMAGE FILES Ssalicp, JL aes 


** For various sizes of Image(Graphic) Files, Compressed 


= |= om om om oo om om cm cme c= oom cme cee cm ice cm oe Om om om om om om oom om oom om om om om ow ow ow ow! oo ow ow ow ow oo om eo cm oe om om oo om om oo co cm cm cme om cw ww = ao 


File Size Image PKZIP StacPack Compress 
O.5K augus.svg 494 rsG 2) .. 5 gee 2oa2 AZ Zo. / 
400- aushh.svg 494 eS) 26 6.1) wi 7 wee. 36 ZO 
pebbl.svg 494 153 S10 232 Ze2 155 31.4 
Ciseivik. £ Be) Bee 647. 9570 61.8 406 O76 
compa 460 294 SS) 2 fERs Die 306 SO 5 
Avg 508 220 43.3 204 One 229 401 
1K freel.wpg 1210 642 53.1 7656 Beare 76k 64.5 
sgOo-— mktbl 1044 Vs2 Oe lees Gi. Oo 188 aoa © 
calge.£ 877 546 62.8 527 60.1 584 66.6 
weep .f 893 509 57.0 489 54.8 573 64.2 
pglab.f 924 465 50.3 451 48.9 566 61.3 
Avg a0 BS: DG... Soo Bon? O36 S65 
1.8K free3.wpg 1422 Goal 28.6 693 Moa.) 76 Sr. G 
1440-fhvst.wpg 1916 1050 #£54.8 1067 55.7 1299 67.8 
2160 free5.wpg 1644 718 43.7 726 44.2 985 59.9 
free6.wpg 1618 762 47.1 765 AG) | Sy eeeS 61.5 
Avg ros 0 BO 48.8 813 49.3 1039 630 
Ser snow.rt 3478 1400 270:, 3) mee 0] Wee lao Oe, 
2400-patti.shp 2432 745 B07 657 oo S225 Live 48.5 
3600 headc. 2842 W-Ba)e) ood ee ibe ae a Die. 17 63 Oi / 
metal Zo 6 P08 59.5, W529 GO.3. L721 67.9 
fjamm.wpg 2412 Disoweees.teiie9 49.3 .7475 £461.2 
grap2.wpg 3558 1178 33.1 1200 33.7 1729 48.6 
AVG ZE76 1242 Neo eae | 26 a: 44.6 1603 Dw. 
Bre verti.vrs 4945 1449 Zo Some) 7 See 2 oa, | 
tere tonti.shp 4096 o> Ds. 3) A> 7] MeO 2157s re. / 
6K haal.dwg 4368 ipods: a4)... 1876 Me 2020 46.5 
colo 5860 2598 44.2 2928 Bue S126 Do. 4 
Garfi.imi 4961 2493 Om cose Dee oe aS, 
grope.f 4538 ine 53 40.8 1968 43.4 2342 516 
Avg Wee, Te te(G, Oo ee OO 45.0 2483 Ss: 
8K e3830.dwg 8464 2596 30. 7 ee099 B16. Oy 5.6 42.0 
6400- etbl S279 35) 72) 40.3 3855 46.0 4492 Saras 
9600 imdri.f 8942 230 Srl) oe 6 / Bn 3997 5 43.3 
sSunvi.sha 7685 22976 38.6 2406 44.3 4063 Bs 2, 
syn.me 9147 2660 2 leo 2 Bord 3553 sisi eee: 
teapo S227 2807 35s 7 40.6 4028 490 
Avg 8474 Zooo 343 peo 6 a0 39235 46.4 
ienomeatren.dat 3717" 2912 Zee eee Boro 387 7 Zoe 
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**k For various sizes of Image(Graphic) Files, Compressed 
File Size Image PKZIP StacPack Compress 

10.4-bearl.rf 15200 “see 25.6 4594 3 ORFZ ow 7 3 24.8 

15.6 Genius vrs sis eo 41.4 5337 43.2 6140 497 

Main.shpealiz64 “aes 43.6 ‘247 4, 5 "Oe 2 54.9 

thes2.dwg 11984 4620 3°. Goole 46.0 6579 46.6 

Avg 12905 “azeo S320 oe 3770 10 39.6 

20K Clownsrt sl yelo | SeuicG 34.6 6866 33.5 6128 43.4 

16K=— Eurk. Fri ozes “see 47.8 O01 3 5.9 Geos 9 OG 

24K aeroe.eps 21577 “Grze So 1... 2 enG0 35.4 -FOms 41a3 

bord.shp 20608 7525 36.5 8054 39:21 -bOS 74 5 as 

tsal.dwg 18688 8280 44.3 9550 51.2 10355 Sse 

Avg TOS9> “ose 38.7 83 42.9 9176 46.8 

40K golf.dat S@mme6G 7i0Z 14.2 e200 l/s3 1 Perl “2a 

32K-— birds.rf “4785 7377 36.3 19056 39.73 19989 Aer 

48K scree.rf 38147 3124 G2 2427 LOc 634300 lis 

sql.sha 44976 W2Z175 27.1 Seer 35.4 20129 44.8 

IMGS2rGo s0gs2 Agel iS . 81520 17.0 s6me85 22 ol 

Avg 42385  OO@s 21.1 P0569 2570 Seo 29.4 

70K slib.shp 70400 40863 58.0 48621 6863 4912820 “Gome 

O90) = bvese e442 68807 Slliz25 72406 8520 (4205 830 

biqese. 77029 64653 83.9 68568 88.9 6G8073 32208 

show rep Ov l 7 30978 3/7. / Meme 73 40.5 50642 612% 

dale 82174 31758 38.6 34061 41.4 45660 3253 

Avg 79254 47412 59.8 60366 63.6 57554 ie. 

120K augus.ml18 111864 27412 24.5 32016 28°76 304345532777 

96K- bush.m18 111864 24910 22.3 29759 26.6 28922 2 

144K peb.mie £11864 24980 22.3 22722 26,2 23504 25. 

lenno.iml 129632 36770 28.3 Wze13 32.6 37095 4am 

movie Ens Sles 36709 39.3 94754 42.4 57196 583% 

Space.iml 129700 22373 17.2 W509 22 wee 15 

Avg LVSS8 ] geo 2 25.3 33756 2922 32827) Fee 

190K plant.mift 177930 40202 17.0 44557 25.0 45708) 2am 

152K- bdy2.cbhd 228799 1174240 49.9 273052 5126 Ove ya 

228K  aimg5.rgb 223800 181291 81.0 197694 35.7 215044) cc 

img9.rgb 200427 139112 69.4 148318 74°70 136806N6cm 

imgl4.eps 176370 WS@C7 41.0 90457 Si 974199 32 

Avg 201475 107766 53.5 Ile@6l6 Seno" 1lo7 see ae 

300K adqd.éps 320174 YO8a99 33.9 B27433 2253 Woo 

240K-img13.rite 243696 149260 61.2 Vee@506 Ger7 13905 s6773 

360K bdy.cbha 269981 "84222751. 2 “e146 Ss er ee 

Wal) oye%e/ 71 362473 147958 40.8 170508047 ~ ihe 4s Oe 
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xx For various sizes of Image(Graphic) Files, Compressed and 
Archived 


eee ee ee ee ees ee ee eee ee ee ee ee ee ee es ee ee es es es es ee es es ee es es es ee es es es es es es es es es es es ee eee ee ee eee ie 


File Size Image PKZIP StacPack Compress 
Avg Powe oreeer 9 OOo. oe DO0424° 50.6 157585 53.0 
SeeGwesa4qu.scr 494267 207812 42.0 226309 45.8 294227 59.5 
eto) 2 22S 920772 171576 22-6 217306 41.3 169695 32.2 
Pei oovee.scre 471937 1771599 36.4 1872/1 39.7 272544 57.8 
Pees eeeoe ooo 2 SO 50O2 L745 o21548 75.7 291383 68.6 
Avg Were 7 ee S024. Sz se8 109 49.7 256962 53.6 
mueniaemome Ger 767399 247580 32.3 273665 35.7 43190254 56.1 
peek ball.ser 742684 386224 52.0 419076 56.4 471329 63.5 
coeneoedc.Ser 802894 540307 6G/.2 571313 71.1 534562 66.5 
homes etamoome OS 6027157768.9 70S721 73.7 646509 67.3 
SQlmpese wlO7O)l1 620446 76.7 884196 82.6 833651 77.9 
Avg BGs (emo 45yol. 1 571406 65.7 3383261 67.1 

Total 10,633,651 5,149,569 5,625,605 5,828,549 

Rat1oO 100 % Sly. Sas Bo. 1 Doe lbao 


67 


APPENDIX B. RESULT OF EXPERIMENT FOR ARCHIVING SOFTWARE 
Table 5 COMPRESSEDSAND ARCHIVED =) ti Sh waies < Fig. Ga 


**x* For various Sizes of Text Files, Compressed and Archived 


File Size Text ARJ221A LHA213 PAK251 
0.5K shutt.mca 555 345 62.2 345 62.2 412 Time 
400- oilri.mcd 554 345 62.3 344 62. 1 Aver 742 
600 spira.mcd 640 407 63.6 407 63.6 486 75.9 
cond.m 446 291 65.2 291 65.2 343 76.9 
dec2h.m 555 347 62.5 347 62.5 429 176 
Avg 550 347 63.1 34g 6321 416 75.6 
1K feath.m 1207 #661 54.8 661 54.8 845 70a 
800- anhar.mcd 1025 623 60.8 623 60.8 757 73.9 
1.2K polar.mcd 809 502 62.1 501 Gils 9 G0 74.5 
hex2n.m 1053 #580 55.1 580 55.1 729 69.2 
expml.m 804 467 58.1 467 58.1 574 7 ime 
Avg 980 567 57.9 566 6758 Vee 71.6 
1.8K bode.mcd 2258 1218 53.9 1218 52.9 14277) oem 
1440-boole.mcd 1455 761 52.3 761 52.3 990 68.0 
2160 brake.mcd 1947 1000 51.4 1000 51.4 1203) Waueee 
compf.mcd 1528 806 52.7 806 52.7 964 63a 
erf.m 2062 1010 49.0 1010 49.0 1168 56.6 
Avg 1850 959 51.8 959 51.8 1150 62.2 
3K anten.doc 2737 1370 50.1 1372 50.1 15045 “Sore 
2400- mks.med 3772 1671 44.3 1671 44.3 1877) Vaome 
3600 besse.m 2426 1165 48.01165 48.0 1336 55.1 
bilin.m 3076 1400 45.5 1401 45.5 Lega) “Som 
cplxp.m 3021 1317 443.6 1317 4.43.6 14628 ome 
Avg 3006 1385 46.1 1385 46.1 1556 #£51.8 
SK readl.doc 4259 2034 4728 2085 49.8 2207 (aoe 
4K- inst.doc 4029 1788 44.4 1788 44.4 1948 48.3 
readm.txt 5594 2258 40.4 2260 40.4 2491 44.5 
cgs.mcd 4383 1900 44.3 1902 43.4 2080 47.5 
direc.mcd 5112 2066 40.4 2067 #4,40.4 2291 44.8 
Avg 4675 2009 43.0 2010 4390 220 947 
8K stmed.msg 7900 2892 36.6 2894 36.6 3226 40.8 
6400- redm.mcd 7615 3421 44.9 3422 44.9 3659 48.0 
9600 bench.m 7377 2436 33.0 24397 33mOGnegeic mesg 
spi2.dat 9449 1832 6.4 18370) 19.42 
fload.c 8727 2699 306.9 2699 ~~30. Gleam umole 
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kk For various sizes of Text Files, Compressed and Archived 
File Size Text ARJ221A LHA213 PAK251 
Avg Ses Ze o© CS yore a B22 292 S655 
ook ole aeec 5240: “5659 Sgrell Sis ents Cee O32 40.2 
mema—-Lexthb.doc 15429 6712 43.5 Fooo3 43.4 7662 a a 
Mmee6 read.doc 15443 S15) 2 7 36. 45°56 55 So. 6 5960 Soar 7 
remezZ.m 15407 4291 ee eae 2 2723 4765 BHO) au, 
readqd3.doc 12006 4748 39. Seay OZ 3970. 5O2Z] ANE: 
Avg ra7O5 5407 45.0 95411 Se7,4 59) 2 a7 0 
BOK thesi.doc 17408 5936 S41 59 36 Ge 6513 Sew 2 
Meh arrow.doc 21582 9978 46.2 9940 46.1 11044 5 re 
Bek csahel.doc 24911 7499 Z0n 7605 SOS. 6tol 32.4 
redu.c 21931 4674 ZS a OS ele D0 25S 
Spll.dat 21477 6186 Zomo ol Lo Zoe 56620 Sule ss: 
Avg ZabaG i «Ge o> Se (OCG BAO 7530 Spore 
mer Cchara.doc 42223 ieee 2 S026 -131.52 Opie (a6 7 Se © 
eeeicela. nin 50425 19446 38.6 20289 40.2 21239 42.1 
48K setup.inf 50014 ieee wee vomeee Oo 1 25,1 13479 27.0 
Stoeseeo2ols> P5011 28.6 15327 29.2 16291 31.0 
Deters nlp 732553 8846 PS) 6 |S) IMSS. 2776 (9659 Zoe > 
Avg 5752 eee o OOO aed tal 3 DOr HeaOtS 32.8 
mo holid.doc 55584 30664 SSeS o D530) 32194 BT 5 2 
weaee mead. hip 53184 Memey i Zoaeewesez 7) 25.0 14660 27.9 
Seemeetcen im 52616 16849 32.0 17441 33.1 18217 34.6 
Memlaadoe 7/9144 Cae 3 30.1 24488 S052 2553 3 324.5 
Sesermaee 55756 13403 24.0 13793 2m eel) |66L226. 6 
Avg 79253 JS) Sree OO ee ard See OZ le 3 3570 
meercoast- mlm 130810 104803 80.1 107152 81.9 108785 83.2 
foe anlg.1lib 138727 17247 12.4 17484 ae Go 2367 15.,.4 
144K exe, Lib dtalo53 8477 Gran s Vo Gre, a 6. 92 
miieot. Peo ts5e46 8334 Geez, S552 Gree. 7? 60 OG. 
Mme pe laOes2 12018 tO eleiz 2 ieee O 53s) 14.4 
Avg Wee 4 Sel 6 Pee oes S100) 623.9 33686 26.2 
190K ssims.mdr 212493 18510 oh gan oro exs! See ewe a4: 9 POR-e 
ieee —evyals.dat 159201 91891 57.7 92303 Semon 20799 60.2 
Peek bipol.lib 185420 18868 Nh O ee le, iva? 2.Gue2 © eal 
diode.lib 158181 16226 TOss. 19024 Ze 0 ee 7 145 1. 
Puieelib 184757 17372 9.4 18174 Wee 2ge0 11.9 
Avg OOOO 3257 3 Ole SO? tere 052 18.4 
MereOolatuenrila 267569 96290 33.5 100159 34.8 103045 35.8 
240K- 
360K Avg Zovecoeeeor oO. 32.5 8OOW59 34.8 103045 35.8 
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** For various sizes of Text Files, Compressed and Archived 


— oa ae ee eee ee eee ee eee ie eee ee eee ee es es es ee es eee es es aie eee ieee eee ee eee eae ee ee eee eee eee ee ee ee ee 


—— oo ae eee ee ee eee oe oe ee ee ee ee ee es ee ee es ww ee es es es es ee ee ie ie ie ee eee eee ee eee ee eee 


500K ridm.txt 454374 136477 30.0 145273 220 eS eee 
400K- 
600K Avg 4954374 136477 30.0 145273 32,0 15) oe 


800K tchel.tch 976250 444057 45.5 469940 48.1 47732e34c—. 
640K- 


960K Avg 976250 444057 45.5 469940 48.1 477326 ta 
Alenceult 4067-716 Wee58 eae 1,310,669  °ssesiaee 
Ratio TOI) es 30.9 % B22 ee 34. Cr 


18) 


ache momecOMPRES SED eit DeARCHIVED#EEXECUTABLE FILES < Fig.8 > 


kk For various sizes of Executable Files, Compressed and 
Archived 


File Size Executables ARO 22 A LHA213 PAK251 

O.5K isat.exe 568 85 15.0585 i.0 SO eee 
400- chkri.com 688 S70 $22.8 570 82.8 595 86.5 
600 rambi.com 307 252 82 ..19252 S24) 268 Oo 
exetl.com 413 407 98.5 406 98.3 400 96.9 
fasto.exe 680 198 2 OS 29.1 193 28.4 

Avg 532 3.02 56.9 7502 56.9 308 DT aS 

1K egaep.com 1006 #640 63.6 639 63.5 684 68.0 
800- gen4l]1.exe 1125 103 8.2 203 S22. 85 Pee 
m200e Loadt.com 1131 574 SOs Si BOnS GGS 58.8 
prtsc.exe 1176 418 2) ays ee) thee’ See 419 35.6 

Avg 1110 ree e4 oo. lees See) 463 poles 

feok CUrsOo.com 1452 LL ee 77.3) eS We 3 L226 Sao 
1440-gen42.exe 1477 E57 O56 Sy: iOrrG 166 ee 
2160 6/7ves.com 1559 964 61.3 965 61. 9 ea 1. 89.9 
runti.exe 1590 7A Cees ee eee 44.9 1067 oe ere 
dbase.exe 1588 7A Or Ose us WoO LOG > 67.1 

Avg i533 734 oo ee 47.9 985 e423 
waeegala.com 2388 1100 26 ie LOO aS ge J ibe (oj 63.3.0 
EeoO0— more.com 2618 1971 oS, 3a 7 |. GB SPs 0) 94.7 
3600 appen.exe 2902 2770 (Shoes es 7) (on, 95.4 2902 LO O20 
setna.exe 3174 OAG 60, 2 1o iG 60.4 2440 7 Om 
aStGiecom 2557 i 36 67,9 Pls} Gyaoo 2232 S723 

Avg Z/2cG 1899 69.6 1898 697-6 232 Sac 

5K edit.exe 4837 3095 64.0 3095 64.0 3654 VAs aS 
mee 6 Strid.exe 4837 3095 64) 70525095 Gade 0 oO 654 (ess. 
6K shell.com 3894 1007 25.9 1006 Zone 24303 BS. 
grep2.exe 5934 3559 59. 653540 Soe ola 69.8 
BOuUCK «Geom 5118 3248 63.55.5746 G3 3629 Vee 

Avg 4924 2797 5O-- Gee oy BOG 5555 ore) oll 

8K stup.exe 7520 be oF i) Oe ee VO SSc9 7 Sa:S 
6400-wpinf.exe 8192 5092 62.2°5093 Gee 5746 eOmeae 
9600 patch.exe 6788 4417 Coomera Gon oO 7 TN IL 
Geep.com /029 4519 64 oao ls Gidee 3. SARC S CLS ane: 
tasm2.exe 6984 3933 BiG oaso oS 565° 4565 Co. 7 

Avg 7303 A645 Si Bae) BAGS: 65.6 5233 Foo 3 3S 

ike gGrao.com 15842 7818 49.3 7820 294 S632 Sy eS 
HO .4— mips.com 33597 5149 BiGa) @ oo © S8.7) 26120 26.0 
15.6 check.exe 10043 6393 63,7. 263901 637617043 910) 2 


** For various sizes of Executable Files, Compressed and 


Archived 

File Size Executables ARJ221A LHA213 PAK251 
share.exe 13424 7266 Bits I 7 Zap 54770 Feeee 59.9 
Cre. CxXem@hOGS9 “7253 69.0 7349 68.9 6006 Toe 
Avg 2656 6796 53.7. 6792 527-7 7570 59o we 
20K pkuz.exe 23528 17491 74,3 7230 72.23: eGee votre | 
16K- reduc.exe 16505 10916 66.1 10899 66.0 Tess “OS 
st.exe 17184 10852 63.2 208240 63m: TiS e7 67 .4 
red.exe 16701 11002 65.9 10867 658 11727 aor 
mcstr.exe 16395 8419 51.4 6406 52.3) 9 er 56.2.0) 
Avg fe OG3 Lag 6 65.0 Biya 64.9. 12552 69.5 
40K pkz.exe 34296 24487 71.4 (246265 712°8 25569 1426 
6K lha.exe 34283 24477 71.4 2461) yA ..S. 257 Gi 
dcm.exe 45212 22004 48.7 22139 19.0 23307 Sie 
COPY -exe=4239¢ 18672 44, 0 -rege4 Ae) ee 4] oa 
cfig3.exe 41984 23392 55.7 23472 55.9 24558 S622 
Avg 39635 22606 a7.0 22756 57.4 2034 60.3 
70K chkfd.exe 59208 33522 56:. 6 73 5.7. 56.9 3isa6o2 Bo .4 
56K- hdini.exe 75915 38888 5)...2) S96e2 B2.c 40472 5 Aas 
84K drce.exe 84322 40100 47.6 40587 468.1 42226) 55¢080 
globa.exe 83854 38833 46.3 39449 47.0 40979 48.9 
local.exe 58742 26650 45.4 26939 A ik 2S eles 48.2 
Avg 72408 35599 49.2 36068 50.1 67625 e208 
120K conve.exe 105141 55826 53.1 56929 54.1 59396 S623 
96K- graph.exe 107520 67494 62.8 967977 63.2 6G9SZ6 64.9 
144K 113.exe 93399 47234 50.6 47875 51.3 50096, “ss 
1ft.exe 111894 20825 18.6 WosG4 18.6 24008 212 
Avg 104489 47845 45.08 46401 4628 5063S 402 
190K desig.exe 175292 72765 41.5 74001 42.2 80172 AIS 
152K- dash.exe 197274 93040 47.2 94191 #47 S256 3ee72 
228K disp.exe 179560 57256 31.9 58745 32.57 G3c7 eee 
dsl.exe 167734 55602 53.1 57432 3461 62308 37 ae 
exec.exe 172388 57193 33.2) 50057 337 Geos 2 303 
Avg 178450 67171 37.6 66448 3824 7433252 
300K mcad.exe 289664 130528 45.1 132901 45.9 142049 49.0 
240K-check.exe 351232 155974 44.4 158586 45.2 72554421 
360K cproc.exe 376486 141780 37.7 145546 Ga27 e752 
wcesim.exe 284184 94761 33.3 97933 34.5 106577 37.5 
pegpp.exe 273432 110969 40.6 113517 41.5 122113 44.7 
Avg 315000 126802 40.3 129697 41.2 mao4io 44356 
5OOK mat38.exe 428768 199529 46.5 201689 47.0 211639 49.4 


a2 


*k**k For various sizes of Executable Files, Compressed and 
Archived 


mut =Oppoo.cxe 4156612 5175359 41.9 177211 42.3 188172 45.0 
600K stmed.exe 548640 244221 44.5 248064 45.2 264243 48.2 
Dis@pewexe 948992 2275/1 41.9 2371680 42.6 245129 45.1 

7. 3 2) 


Avg 484993 211745 43.7 214661 44.3 227296 46. 
800K pshel.exe 635552 276279 43.5 281149 44.2 307440 48.4 
640K-pspic.exe 781504 324796 41.6 331450 42.4 359426 46.0 
960K tc.exe 887104 434990 49.0 442889 49.9 459652 51.8 
Gace e~o moro O29 G2Z3980 96.9 624399 97.0 626453 97.3 
Avg Uwe] “SaioOllys6.3 4219972 57-0 438243 59.5 


Total Ss 76, 430 47105576 SO, 225 47,224,910 
Ratio 100 % 47.9 % ae). See a 


ae 


Table 7 COMPRESSED AND ARCHIVED, ABASE Outpuc Biles <Pig vie 


** For various sizes of dBASE Output Files, Compressed and 


Archived 

File Size aBASE ARJ221A LHA2 13 PAK251 
0.5K stoknJdbir c40 oj 49.1 314 49.1 ° 3975 Toes 
400- sysid.dbf 418 sys: 37. oes Sif) ees IES) A Gweer 
600 systi.dbf 427 Lay? 33. Sasa 333, 208 43 3 
trans.dbf 640 248 38.0 246 36.0 376 a ae 
Avg 531 216 40.7 216 A0.7 Bue 51.4 
1K sales.dbf 894 279 S132 Si. 2: S63 Ar ivS 
800- stokp.dbfi 696 ae A). See oO Ai Se aoe 52 sl 
1200 <acGrr- dbigalZ se 397 3107s o7 312.0 509 3 oe 
Codes aber 1rs2 aie 39°.-6 55 B95 256s allay (0, 
items.dbf 893 300 33. 64500 33.6 3:70 fel 1 
Avg LO23 340 33-2 7560 355.2 4a AS 8 
1.8K claen- dpi WeGe 739 44.4, 4935 14.2 9950 57 ae 
1440— cCust-dadbi 2043 Je 3/736 Jos 372.5 See A Siz 
2160 peopl.dbf 2048 849 41.5 845 41.3 1085 53.56 
Systa- Gdbr loa Sig 2 Sas ores) 25.0 Sse 5 6 
StCGCK,. dot 1664 499 30.0 499 30..0 Gaze 39 
Avg L738 648 36.) 8646 36.0 846 GS 
3K Contec abr 2s 04 826 35.9 S2e 35.9 2FoSs> 45:26 
2400—-CuSsStGeadbr 2ZEEG LA'S 44.8 1189 44-56 ia 6 5 oane 
3600 \imvenmeconmecs 7 Pez 29.6 07 02 29.56. Ga 3 Sices 
hal3k.dbf 3268 IS 7 892424230 39.2 1499 450 
61 ik, Clene 82.02 849 26.5 ee 6 26.1. Beis 3a 
Avg 27 C2 oF 2 35.2 Jog 35.0 Ieo 4225 
5K googdedbr ys ilz0 984 19.259 19.0 or 24.9 
ak names.dbf 4096 1938 A7.3 S29 AF. Zilles 525 
6K sysco.dbf 5586 808 14. 53200 14.5 tae0Z2 23m 
dba4.dbf 4969 Loy 7 2) «ad peso 27-26 Wess 3 3x9 
ha5k.dadbf 5296 is 3 36.5 8iez 36.3 2065 A Ome 
Avg S10) 5 2 1408 28.0 Eraos 2712.9 yes 3535 
8k sySel.dbft 7esu: ee 15.0 Des 14,9 2473 1836 
6400-— @baZz sabi 7842 2088 26.6) 2072) 26:4 2445 3 ee 
9600 Sk 1 2dibf saga 1601 19.5 segs 19.4 2a 2 ie 
hal8sk.dbf 8260 2004 35.1 Begs 3540 382 Sore 
Skw2. dbf area TS 72 19.2 £5538 IS BP, Oe! Sis) 2 
AVg 8064 Meo 7 23.2) 1357 23 -09Zae 1 26.3 
13K empWo-dbr 12286 3403 214i BS S59 27 «3. Ooeks wo lle 
10.4- dbal.dbf 12639 S15 & 2 SO 30 3G 24085 32 23m 
15°6° Of TC “dbr = ieee 352 2 31. 35S S00 31.1 S387 55.3 


74 


kx For various sizes of dBASE Output Files, Compressed and 


Archived 

File Size eaBpAse ARJ221A LHA2 13 PK25 1 
Mes. abt 13252 4409 O34..5 74373 25.0. A270 Sola. 
Quek dbf 13453 W659 Jers ivessml ILA BJs 14.3 
Avg W2579 575.0 25.703 164 Pom 3620 28.8 
2 OK Geass eador 19245 A053 Pale le 3 6 Dalen) 2d & 2 7 ag ae © 
16K- ofill.dbf 16274 3953 EE SOs! 2420 2439 as es 
woe rOrdi?: abf 20102 4339 2b. 6 e429 7 Pina ao) 3 24..4 
PADIS) a élone 20256 352.33 MG ©. 9322 1.2 iS.9 24245 le, 
le Oke dbr 20272 6446 Ble: 64 39 31.8 6988 BAS 
Avg 19230 ATG) S. PLD IS eels 8 a ie: Zeeo aoc | eS 
meonK h40k.dbf 40240 12198 BO. 3 ees O04 BeOS ce 2 Deas 
Ger haddOk.dbf 40240 ike oS BO), 4 22296 Bie G. Loewe | SES 
48K alk o2. abt 2354 She) FA} ia. / soo 4 ieee Oe oe Lowes 
40k 1.dbf 20354 5964 es Fics oles) ae oO eS. 7 
Avg 40297 9081 2) Ia eS) IAS) Ce eo ae Cae 
moK h/7Ok.db£& 70192 20647 29% A327 O25 30,.0° 222056 B44 
Bok-~ ha/Ok.dbf 70192 2OG2 5 29.4 20979 29. 9) 2 oo S19 
84K MOK wee cb O50) 9795 IS oe 2) Shell Va. Oa 2 7 iA. Ss 
(Oke clot 7.0538 0 9943 ia 1 OO s6 a3 US S.6 TS .20 
Avg 70 ol 5? 53 PANG lllarsy Oe 220) Ese 6 3 EAS es 
Merk =Hl2Z0.ab£r 120190 34692 OL eres Gers Es. ZOO. 3796 1 B.26 
Bon=- Mal2ZO.dadbf 120190 34707 28.9 35609 2a? 305 7 Beles 7 
Pomel a@br PT20802 16471 oeON ow G7 seo 1747 4 kA eS 
moe Get 120802 16417 3 602 669 5 io. Skea 3 4 1 ae 
Avg WZ0496 25572 Zio Zo 7 Dale 2S / 250 
mom hlig0.dbr£ 190156 54549 Piss? sa OU4 2 POE ae oa / Sales: 
Mere -190K2.dab£f 191170 25518 ee Saee Gal 2S We. 2727 4 ses: 
peek 190K1.dbf 191170 25516 ie oo ee Sel eee 2 Oo 7 eee 
Avg mIOS32 35194 Ve. 4 36092 Seo 37-97 3 19.9 
Boom SOOK2.dbf 301762 39899 Diese A OO 136 425186 eAles ck 
240K-300kK1.ab£f 301762 39984 Se 3. A097 3 S364 246 2 ile ee E 
360K Avg perky Gol 39942 eg cee OOS 1. te 6942490 sllecinver li 
Beom SOO0OK2.dabE~ 502818 65854 i367 Lo 35 C022 3 EAS 20 
400K-500k1.dbf 502818 66134 NSE iS Fas) ORs: ieee S 7.02 | 1 gecis2) 
600K Avg 502818 65994 ees 679.64 bere 5. 702 |. / Le 6, 
800K Zanpeo. dbf Gey 554 eos 26.5 26019? 27.0 290739 30.1 
Peon GoOK2 abt 804450 104979 13.0 107826 13.4 111587 13.9 
Pooler oOrmimnaot eo oso NO5l41 13.1 707873 13.4 111703 13.9 
Avg Caco leeleoe '6 18.2 'MSG630 18.5 171343 20.0 


12 


** For various sizes of ABASE Output Files, Compressed and 
Archived 


File Size dBASE ARJ221A LHA213 PK251 
Total 5. 03 aap 1,051,036. il) 0@@e 752 1,144,139 
Ratio 100 % 7 ae Ve 0 = 1943 a2 
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Table 8 COMPRESSED AND ARCHIVED, IMAGE FILES Sat igi > 


** For various sizes of Image(Graphic) Files, Compressed and 
Archived 


File Size Image ARJ221A LHA213 PAK251 
O.5K augus.svg 494 del Bo. ee Ze) + LS aos 3 
400- bushh.svg 494 Or, Zl. 7 SAO, gale 7 107 ee) 
600 pebbl.svg 494 ths 2a. 7 Vie Zan? 13 ZiO5 
Guchk. tf See) o26 54.4 326 54.4 380 O34 
compa 460 Zao 47.6 219 A]. 6 64.1 
Avg 508 Ag B34..8 7177 34.8 206 40.6 
1K freel.wpg 1210 #612 50.6 612 50.6 767 63.4 
See- mktbl 1044 Gres Sie 2) 6 ILS Sows 774 74.1 
Poo qrigt.f o7 7 455 1. 9 eas See? S56 63.4 
pacp.f eo SUG) 433 48.5 433 Mere) OS Se. 2) 
pglab.f 924 Bow A O28, 43.0 534 Sy ons) 
Avg elev 502 EOF. 7 solo SO.) "eos (ee, 
1.8K free3.wpg 1422 oa AY. 3 Ga Agen. > S72 Glens 
Me0-fhvst.wpg 1916 103 0 Bo. +O LOS 0 S ome eS, (cee 
2160 free5.wpg 1644 689 i 9 oes 4179 VlOCO S06 
free6.wpg 1618 Pa 45.8 741 45.8 1040 64.3 
Avg 1rG5'0 783 4 / aDese 5 ae O63 6546 
Biko SNOW. rf 3478 dee, Sloeeey IL ee, SG oy 43.9 
2400-patti.shp 2432 676 Dis Sere 27.8 969 39.8 
3600 headc. 2842 1304 45.9 1304 Aor, 1535 54.0 
metal 2556 1324 Di 2. VL oe4 5222 1433 Seo gee, 
fjamm.wpg 2412 a2 ARG A ee wis 1560 647 
grap2.wpg 3558 1126 31.6 1124 31.6 1541 43.3 
Avg PAE TAS 1140 596 140 39°36. 1436 49.9 
5K verti.vrs 4945 1G 5 Za) Canes O'S Pele O 9 S2 S oe 8, 
4K- fonti.shp 4096 1EG 310) B.S ato / Soe wos 7 47.3 
6K haal.dwg 4368 #518 oA. 8 505 Uo Sy dlachons, 44.8 
erollc 5soe0 Za > 41.6 2436 41.6 2678 45.7 
Garftisimi’ 4961 Loy do. 9 2275 Ae) 2d DNS 
grope.f 4538 1666 SMG Tey MIke Xai s: BO. oe AZ c2 
Avg 4795 PSAs CP AS aes dle 37 See 48 44.8 
8K e3830.dwg 8464 vay) SNS) 2 Or a2 5 36 Cueto oS 2 84.02 
6400- etbl Soy 9 Sbe, BS..2 poe 0 Come, O16 42.0 
PeoO amdri.f 8942 Zoo 230. Onze 7 G6 OLS) ees ag SO) ars! 4.02 
sSunvi.sha 7685 Ze 0 S6.S 2zeo7 S629 3.28 40.7 
syn.me 9147 2473 21. Omca6S 26.9 2764 oO eZ 
teapo See 27,05 BZ = Saeco S © 32.6 30779 37.4 
Avg 8474 2704 Si. Deo 2 Bere 27 3074 36255 


ce 


** For various sizes of Image(Graphic) Files, Compressed and 


Archived 

File Size Image ARJ221A bnAZ 13 PAK251 
13K ar@h.dat 13717 3577 13.287 7550 18.8 3261 2396 
10.4=-bearl. ri 15200 V3a4ake 22.4 3342 22 O74 0 Gn 267 
15.6 geniu.vrs, 1236)" “saz8 282553555 28.6 80S 34 #9 
main.shp 11264 4694 41.7 4684 41.6 S830 19 
thesi.dwg 11984 4321 36. ly 4s 3526-4322 41.0 
Avg 12905 3706 28./ 386 26.6 4426 or 3 
20K clown. ri = e766. “Se40 S41. 1 23250 30% 6 6379 3Diae8 
16K=.  Gurk2 ck Selo es sac 45268699 44.9 S772 507, 
24K aero.eps 2157/7 6209 28.8 6243 Zon? 702i 3 Zins 
bord *Ship 20608 7250 BO i.2 (ios 349 8231 39.9 
tsail.dwg 18688 7625 40.8 7584 405-6 S620 46.2 
AVG YOS95 1074 26.1 7024 gore 3 OOF 40.9 
40K golf.date Ss01e6 Soeee 2. 1 1) Gre ae 12.4 8166 Los 
32K- birds.rf 47865 16217 (33.9 PS9OU9 (3323517677 eee 
48K scree.rf 38147 2263 >. 9 2209 2.8 seay4 Sie 
sql.sha 44976 11727 26.1 11888 26.4 22656 )72Ce 
UMC Cech OF S52) steme7 12.6 Ber 126, 52s 172 
Avg 42385 8069 fo. 1 e026 18.9 S400 Zoe 
7OK slib.shp 70400 39370 55.9 38899 5553 41342 0> 
2 OK= bv.sr 84432 63456 75.2 63601 75.3 6449355) 2 
84K bfg.sr 77089 59117 76.7 59088 76.6 6@l110 7m 
show S821/7/ 29309 35./ 29616 36.0 31653 eee 
Jails 82174 29839 36.6 301366 36.7 32405 re 
Avg 79254 44218 55.8 44274 55:.9 46060 Seo 
120K augus.m18 111864 24492 21.9 2241/1 21.6 29938" 2c. 
96K- bush.ml8 111864 22129 19.8 24832 85°27 10a 
144K peb.m18 111864 22025 19.7 27701 1975 27546 23 
lenno.iml 129632 32916 25.4 31792 “2405 S6356002 7 
movie 98563 36348 36.9 36781 37.3 3340008240 
Space.iml 129700 20063 15.5 19162 Dame 23744 Lie 
Avg 115581 26329 22.8 2590/7 2254 30524 352 cr. 
190K plant.mif 177980 25647 14.4 925365 12473 297500 ore 
152K—" bay 2. Chad 228779 eee 5.9 109041 47.7 l1iZz064 5749 3a 
228K img5.rgb 223800 166283 74.3 169196 7536 174085 eae 
img9.rgb 200427 121321 60.5 120977 60.9 124727e@ee 
img14.eps 176370 67453 38.2 67509 38.3 70915 40.2 
Avg 201475 78837 “89.1 986s 4659 eZle4 sore 
300K ad.eps 320174 98293 40.7 98369 3085 10677050 3r 
240K-imgl13.rle 243696 128686552. 29130106 Soy. eo 
3 610K bdy.cbd 261981 128032 4829 128538 49° tebe ec 2] 


LS 


** For various sizes of Image(Graphic) Files, Compressed and 
Archived 


File Size Image ARJ221A LHA213 Pie? ou. 
etioocera e277 omer O23 562571593766 36.9 146741 40.5 
AVG Peo eee wors tO 122695 41.3 130067 43.8 
Poe P44CQt.scr 494267 192322 38.9 194529 39.4 207753 42.0 
mine aoO02.epS 526772 157291 29.9 156857 29.8 166791 31.7 
Pele oOovet.scr 471937 159396 33.8 162441 34.4 171892 36.4 
meee e@temae tS 325269999 63.5 274615 64.7 283949 66.9 
Avg IO oy Pele o52"40-6 197/111 41.1 207596 43.3 
Meme bigk.scr 76/399 225720 29.4 234851 30.6 249155 32.5 
eon ball.scr 742684 355815 47.9 364433 49.1 379561 51.1 
pei 6hbeac. Scr 803894 485758 60.4 493360 61.4 562386 69.9 
MeeoaeGiae Gls OS o97612 61.5 604234 62.9 658037 68.5 
FomeemesenlO7@ll1 754167 70.5 775119 72.4 825469 77.1 
AVG Sem ogee Zola 55.5 494399 56.9 534922 61.6 


Hota 1 OOS 35,6 a1 Ay 3506), 43 2 Ao) oe Oe Decoy 2Os 
Ratio 100 % 15.79 5% 47.4 % 3) il ees 
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APPENDIX C. EXECUTION TIME COMPARISON 
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